Q&A: What happened when a chatbot provider upgraded to ChatGPT?

ChatGPT is a chatbot sensation that virtually every enterprise — large and small — is considering using to create greater business efficiencies. Five days after its November launch, it had 1 million users. Over the next two months, the AI chatbot had more than 200 million users.

The machine learning language program created by San Francisco-based research firm OpenAI offers human-like text responses to queries. It can summarize long articles or text-thread conversations, make writing suggestions, come up with marketing campaigns, produce business plans and even create or correct computer code. And all that can be done with limited investments.

Microsoft, which owns a 49% stake in OpenAI, has invested billions of dollars in the company. It just launched a version of its Bing search engine based on the latest version of OpenAI's GPT-4 large language model, on which ChatGPT is based. Not to be outdone, Google recently announced its own AI-enabled, experimental chatbot.

Intercom, a customer support software provider whose chatbots are used by 25,000 enterprises globally, including Atlassian, Amazon and Lyft Business, is at the bleeding edge of ChatGPT use. It just upgraded its AI-enabled messaging bot using ChatGPT's language model.

fergal reid Intercom

Fergal Reid, director of machine learning at Intercom.

Fergal Reid, director of machine learning at Intercom, said there are undeniable advantages to using ChatGPT over the company's original customer service chatbot, which it has been selling since 2018. The chatbot software is used to help client service reps answer customer questions. Intercom also sells a separate product called Resolution Bot that organizations can embed on their websites to offer automated answers to end-user questions.

But there are glitches in ChatGPT that cannot be overlooked, Reid cautioned. And because of those issues, the new chatbot already being used by hundreds of Intercom customers today remains a beta test until problems can be ironed out.

Nevertheless, Intercom's customers who've tested it are praising the updated chatbot (based on OpenAI's GPT-3.5 language model), saying it has made their jobs easier.

Reid spoke to Computerworld about the process of customizing the ChatGPT software for his company's business use, how it offers business value, and the challenges he and his machine learning team faced and continue to face.

The following are excerpts that interview:

Tell me about your company and why you felt it needed to upgrade its existing chatbot? “Our business is customer service, basically. We produce a messenger so if someone has a customer support or service issue, they come onto a business website and start typing into the messenger and it’s like WhatsApp chat. You’ve probably seen these messengers pop up in the bottom righthand corner of a website.

“Intercom is a leader and one of the first companies to pioneer business messengers like that. So, we have this messenger and then we build a whole customer support and platform for support reps — we call them teammates — whose job is to answer customer support questions again and again, day in and day out.

“We saw [that] ChatGPT… just crossed another level of being able to deal with random, noisy conversations. Humans when they ask questions can phrase them in surprising ways. People during the conversation will call back and refer back to something said a couple of turns earlier in the conversation. That’s been hard for traditional machine learning systems to deal with, but OpenAI’s new tech just seems to be doing much better with that.

“We sort of played with ChatGPT and GPT 3.5…and we were like, ‘Wow, this is a big deal. This is going to unlock much more powerful features for our bots and for our teammates that we had before.'”

How long did it take to create and the rollout the chatbot product? “So, internally, we had our first prototype demos in the second week of January after starting work on the product at the beginning of December.

“That’s a pretty fast development cycle. We had maybe 108 customers live with it in beta around mid-January and then had another [beta] release at the end of January. Now, we’re in open beta, so we have hundreds more Intercom customers who are using it. We’re still calling it a beta, even though people are using it in production in their jobs day in and day out, because it’s so new.

This is a machine to build compelling demos that don’t currently deliver real value. So, there’s a lot of work to do to understand how much real value is being delivered here.

“These open APIs that OpenAI has are very expensive compared to any normal API you might use. It just costs a lot of money whenever you get it to summarize something you do. It might cost you like five cents or 10 cents. That gets very expensive. It’s a lot cheaper than paying for a human to do it themselves, but this is something businesses are going to have to figure out. How do we price this?

“That’s another reason we’re still in beta. We’re like everyone else. We’re still learning about the economics of this. We are convinced it saves people more time than it costs in terms of the computer processing costs, but how do all those economics work out and what’s the right way to build for these things?”

What’s the difference between ChatGPT and OpenAI's GPT 3.5 large language model? Did you work with them separately to create your chatbot? “I would really consider that ChatGPT feels like more of a front end to the GPT 3.5 models. But, yeah, anyone building on ChatGPT is building on the same underlying model, which OpenAI called GPT 3.5. It’s basically the same. What’s different is the user interface.

“ChatGPT is trained with slightly more guardrails, so if you ask it to do something that it doesn’t want to do, it’ll be like, ‘I’m just a large language model, I can’t do x or y.’ Whereas the underlying language models don’t have those same guardrails. They’re not trained to talk to end users on the Internet. So, anyone building products are using the underlying models rather than the ChatGPT interface. It’s basically the same thing, though, in terms of the sophistication of the understanding and the power of the underlying model.

“The model we’re using, Text-Davinci-003, OpenAI released that the same day as ChatGPT, so that’s basically what everyone is working with.”

Did you have a choice in what you were going to build? Was there another large language model from a third party you could have used to build your new chatbot? “Not so much in that ChatGPT at the moment is one application of these models that OpenAI hosts. No one apart from OpenAI can literally use ChatGPT. Technically, I’d say ChatGPT is OpenAI’s service for the general public to live on their website and anyone building ChatGPT things. It’s more technically correct to say they’re using the same OpenAI models to power ChatGPT.”

Is ChatGPT being used for the same tasks as your Resolution Bot product? “The features we shipped initially are features facing the support rep and not end users. We have chatbots that face end users, and then we have machine learning-based productivity features that face the support rep. The initial thing we shipped has features to make the support rep better; it's not for the end-user.

“The reason we did this is because a lot of the current machine learning models from OpenAI suffer a lot from what’s called hallucinations. If you ask them for an answer to a question and they don’t have that answer, they will fairly frequently just make something up.

The reception has exceeded our expectations. There are a few features like summarization that are clearly valuable and then other [good] features like the ability to rephrase your text or make your text more friendly.

“Think of them almost like their job is generating a plausible next completion. It’s not really to ensure what they give you is factual. They make things up. So, we were initially reluctant to just put them in front of end users, answering end-users' questions. We were worried, and I still am worried, our customers would feel uncomfortable that the bots would make things up. And our early tests really showed just putting in a fully GPT-powered bot naively in front of customers is a really bad idea. We continue to work on that, and we think there are solutions for it in the future.”

How can this tool be useful for support reps if it makes things up? “While we're working in this area, and of course have internal R&D prototypes, we've nothing we have named or are committed to releasing at this point.

“We initially shipped [our chatbot] only to support the support reps because they’ll generally know what the right answers are, and [the chatbot] can still make them faster and more efficient because they don’t have to type it in themselves 90% of the time. And then 10% of the time, when there’s a minor hallucination or inaccuracy, they can just go and fix that.

“So, it becomes slightly more like an interface. If you use Google Docs or any predictive texting where it can give you suggestions, it’s OK if the suggestions are wrong sometimes, but when they are right they’ll speed you up and make you more efficient. That’s what we initially shipped, and we had hundreds of customers in beta by the end of January. And we had a really good launch with that. We got a lot of really good, positive feedback with the new features. It made support reps more efficient, and we really created a lot of volume with it.

“Where the reps write, it intuitively allows them to rephrase the text, but it’s not just automatically sending it to the end user. It’s designed to empower the teammate to make them faster.”

Are there other OpenGPT features that are also powerful for customer reps? “Yes. Another feature we built is summarization. These large language models are excellent in processing existing text and excellent at generating a summary of a big article or [text] conversation. Again, we have a lot of support reps and they have to hand off conversations when an issue gets to complicated for them. They have to hand it over to a supervisor and often they’ll be mandated to write a summary of the conversation they had with the end user. We have some reps who tell us sometimes writing the summery takes just as long as responding to the conversation does. But, they have to do it.

“So, this technology is excellent at summarizing and condensing text…, it's radically reduced. So, one of the features we built that we’re most proud of is this summarization feature, and we built that so that you just press a button, and you get a summary of the conversation so far. But then, you can edit it and then send it to whomever you’re escalating it to.

“All the first wave of features are designed to have a human in the loop. The human is augmenting them. They [customer service reps] don’t have to spend a few minutes reading through the whole conversation and pulling out the relevant facts. Instead, the AI pulls out the relevant facts and then the reps just have to approve them, or they can say there was some nuance missing.

“These models are dramatically better than what we had before. But they’re still not perfect. They still occasionally miss nuances. They still don’t understand things that a skilled human rep will.”

How did you modify the ChatGPT software to suit your needs? How does that work? “OpenAI gives us an API that we can send text to and get back the text its model provides. In contrast to the past, you really work with this technology by kind of ‘telling it' in English, what you want it to do. So, you send it a bunch of text like:

“Summarize the following conversation:
Customer: ‘Hi, I have a question.'
Agent: ‘Hi there, what can I help you with today?'”

“You literally send it that text, including what you want it to do, and it sends you text back – in this case, which now contains the summarized version [see .gif]. And then we process that, and present it to the support rep, so they can choose to use it or not.

chatgpt gif image Twitter

An example of a customer conversation with a representative that is then summarized by the Chat-GPT-powered bot and sent to a customer rep supervisor for futher assistance.

“One thing people do is use this to summarize emails. An email will often contain all the previous history of the emails underneath, and you can use this to summarize [that email thread]. This works in a way that programming languages in the past haven’t, but you have to pay a lot of attention to get it to do what you want it to do. When you ask it to do something, you have to be very specific…to avoid errors.

“It’s interesting. It’s a different type of technology to use. It’s a different technology than traditional machine learning.”

It’s a lot cheaper than paying for a human to do it themselves, but this is something businesses are going to have to figure out. How do we price this?

Did you use your own IT team or software engineers to customize ChatGPT for your uses? And how difficult was that? “At Intercom, we have a very big research and development team — like any software and services tech company. I lead the machine learning team here, so most of the people on the team are deep experts in machine learning and PhDs in the area of machine learning — myself included. So, we have a lot of experience training machine learning models and working with them.



Source