The creator of ChatGPT, OpenAI, has announced its plan to improve the mathematical reasoning of artificial intelligence (AI) chatbots as a strategy to reduce the rate of hallucinations.

OpenAI’s battle against AI hallucinations

OpenAI released ChatGPT in November last year after which the chatbot took the world by storm due to its amazing capabilities and wealth of knowledge. The company has advanced further to develop GPT-4 which is even more advanced and up-to-date responses.

However, despite the advancements, generative AI applications have continued to suffer from ‘hallucinations’ where they provide false information that lacks support from real-world facts or sources.

OpenAI has acknowledged these issues saying “Even state-of-the-art models are prone to producing falsehoods —they exhibit a tendency to invent facts in moments of uncertainty.”

The AI research organization added:

“These hallucinations are particularly problematic in domains that require multi-step reasoning since a single logical error is enough to derail a much larger solution. Detecting and mitigating hallucinations is essential to improve reasoning capabilities.”

Due to numerous complaints from users, OpenAI resolved to attach a disclaimer to the application saying, “ChatGPT may produce inaccurate information about people, places, or facts” as it kept looking for ways to correct this error.

After in-depth research, OpenAI has finally announced that it has uncovered a way to fight hallucinations in chatbots. According to the announcement, the tech company looked into two techniques, outcome supervision and process supervision, before settling on the latter.

Models trained using outcome supervision as a means of detecting hallucination provide feedback based on a final result whereas those trained using process supervision provide feedback for each individual step in a chain of thought.

The trained models were then tested on the MATH dataset where, according to OpenAI, the model trained using process supervision registered a significantly better performance since it directly rewards the model to follow a human-approved process and pay attention to every part of the process, unlike outcome supervision.

Source: OpenAI Github

Although OpenAI admitted that results outside of mathematics are yet unknown, it suggested that process supervision might provide a more advantageous mix of performance and alignment compared to outcome supervision if the observed outcomes held true in wider contexts.

Therefore, to aid in research, the corporation made the entire collection of process supervision data available to the public, encouraging investigation and study in this field.

ChatGPT’s Lies

While it is not known the exact reason why OpenAI was compelled to conduct this research, hallucinations by chatbots have so far had very negative effects on users as well as companies.

For instance, in a demonstration for reporters, when Microsoft’s Bing search engine’s ChatGPT-like technology examined financial reports from Gap and Lululemon, the chatbot underreported some figures when comparing its responses to the actual reports whereas others seemed to be fabrications.

This resulted in a lot of criticism from attendees including independent search researcher Dmitri Brereton who wrote:
“I am stunned that the Bing team made this pre-recorded demo packed with wrong information and confidently showed it to the world as if it were impressive. I am even more surprised that this tactic worked, and everyone jumped on the Bing AI bandwagon without doing any research.”
In another scenario, an American criminal defense lawyer and law professor Jonathan Turley claimed that ChatGPT accused him of committing sexual assault. Not only did the chatbot falsely accuse him, but the AI also supported it with a fabricated Washington Post article.

What's the Best Crypto to Buy Now?

  • B2C Listed the Top Rated Cryptocurrencies for 2023
  • Get Early Access to Presales & Private Sales
  • KYC Verified & Audited, Public Teams
  • Most Voted for Tokens on CoinSniper
  • Upcoming Listings on Exchanges, NFT Drops