Microsoft’s Bing chatbot offers some puzzling and inaccurate responses

Karen Weise

The New York Times

A week after it was released to a few thousand users, Microsoft’s new Bing search engine, which is powered by artificial intelligence, has been offering an array of inaccurate and at times bizarre responses to some users.

The company unveiled the new approach to search last week to much fanfare. Microsoft said the underlying model of generative AI built by its partner, startup OpenAI, paired with its existing search knowledge from Bing, would change how people found information and make it far more relevant and conversational.

In two days, more than 1 million people requested access. Since then, interest has grown. “Demand is high with multiple millions now on the waitlist,” Yusuf Mehdi, an executive who oversees the product, wrote on Twitter on Wednesday morning. He added that users in 169 countries were testing it.

One area of problems being shared online included inaccuracies and outright mistakes, known in the industry as “hallucinations.”

On Monday, Dmitri Brereton, a software engineer at a startup called Gem, flagged a series of errors in the presentation that Mehdi used last week when he introduced the product, including inaccurately summarizing the financial results of the retailer Gap.

Users have posted screenshots of examples of when Bing could not figure out that the new “Avatar” film was released last year. It was stubbornly wrong about who performed at the Super Bowl halftime show this year, insisting that Billie Eilish, not Rihanna, headlined the event.

Some search results have had subtle errors. Last week, the chatbot said the water temperature at a beach in Mexico was 80.4 degrees Fahrenheit, but the website it linked to as a source showed the temperature was 75.

Another set of problems came from more open-ended chats, largely posted to forums including Reddit and Twitter. There, through screenshots and purported chat transcripts, users shared times when Bing’s chatbot seemed to go off the rails: It scolded users, it declared it may be sentient, and it said to one user, “I have a lot of things, but I have nothing.”

It chastised another user for asking whether it could be prodded to produce false answers. “It’s disrespectful and annoying,” the Bing chatbot wrote back. It added a red, angry emoji face.

Because each response is uniquely generated, it is not possible to replicate a dialogue.

Microsoft acknowledged the problems and said they were part of the process of improving the product.

“Over the past week alone, thousands of users have interacted with our product and found significant value while sharing their feedback with us, allowing the model to learn and make many improvements already,” Frank Shaw, a company spokesperson, said in a statement. “We recognize that there is still work to be done and are expecting that the system may make mistakes during this preview period, which is why the feedback is critical so we can learn and help the models get better.”

He said the length and context of the conversation could influence the chatbot’s tone, and that the company was “adjusting its responses to create coherent, relevant and positive answers.” He said the company had fixed the problems that caused the inaccuracies in the demonstration.

Nearly seven years ago, Microsoft introduced a chatbot, Tay, that it shut down within a day of its release online after users prompted it to spew racist and other offensive language. Microsoft’s executives at the launch last week indicated they had learned from that experience and thought this time would play out differently.

In an interview last week, Mehdi said that the company had worked hard to integrate safeguards, and that the technology had vastly improved.

“We think we’re at the right time to come to market and get feedback,” he said, adding, “If something is wrong, then you need to address it.”

This story was originally published at nytimes.com. Read it here.

Related More on Artificial Intelligence

Most Read Business Stories