Artificial Intelligence (AI) seems to be gathering momentum and improving daily.
Recent research has revealed that computer-generated programs are 20x faster and 7x cheaper than human data annotations when placed in a space-to-space proximity measure.
This latest discovery has sparked discussions within the tech industry about whether AI should be granted the ability to self-train.
AI’s Annotation Evaluation Reveals Shocking Results
Large language models (LLMs) are one of the core elements in AI software. Since their debut in 2018, LLMs have been used as automation tools to perform human-like functions.
These artificial neural networks have been tasked with writing poems, participating in exams, and labeling datasets for specialized domains like their human counterparts.
Although numerous studies have examined the effectiveness of LLMs as data annotators, one crucial aspect that remained unexplored was their performance evaluation in highly technical domains.
However, the Refuel team’s recent study has addressed this gap, providing valuable insights into this previously unexplored area.
At @RefuelAI, we set out to evaluate the performance of LLMs like GPT-4, PaLM-2 and open source models for autolabeling datasets across a range of NLP tasks. Excited to share our learnings so far, and next steps with the community in this pic.twitter.com/AmOsTrHBJI
— Nihit Desai (@nihit_desai) June 16, 2023
According to the AI-focused organization, state-of-the-art LLMs exhibit the ability to label datasets with the equal or superior quality compared to skilled human annotators.
Moreover, LLMs accomplish this task with 20 times greater speed and cost seven times lower than their human counterparts.
To substantiate this claim, the Refuel team conducted a comprehensive study utilizing four cutting-edge LLMs, including OpenAI’s GPT-4 and GPT-3.5, Google’s PaLM-2, Anthropic’s claude-v1, and Hugging Face’s flan-t5-xxl.
The study involved hiring a group of human data annotators from a third-party platform and splitting them into two groups.
One group was assigned to label a seed dataset of 200 examples with labeling guidelines, while a more robust group was given a 2000 seed dataset to work on.
Upon analyzing the results, Refuel found that OpenAI’s ChaGPT-4 emerged as the outstanding winner, surpassing its human annotators and other LLMs in the contest.
Refuel reported that the GPT-4 LLM achieved an impressive 88.4% agreement with ground truth labels. This is higher than human annotators’ 86% agreement with ground truth labels.
Conversely, the other LLMs examined in the study performed impressively. Most clocked in an average performance exceeding 80% while only incurring one-tenth of GPT -4’s API cost.
Nevertheless, the study persists in its groundbreaking findings. It is important to highlight that illusions pose a significant obstacle to the effectiveness of AI tools.
This is because these tools often display confidence in their responses, even when those responses may be factually incorrect.
When comparing completion rates and confidence estimations, ChatGPT-4 outperformed other LLMs with a labeling score of 100% – correctly labeling 3 of 8 datasets.
Meanwhile, other LLMs obtained an average score of 50%, despite their significantly lower API cost than GPT-4.
This eye-opening study raises questions about the suitability of using these LLMs to train the next generation of AI software.
POET Set to Help AIs Train AIs
Since the emergence of AI, humans have played a crucial role in shaping and advancing this innovative technology.
Some scientists have dedicated extensive effort to making these machines more human-like in their intelligence, while others aim to make them exceed human capabilities.
However, a shift is underway, with the focus gradually turning towards the machines themselves for insights and their continued evolution.
A team of AI researchers at Uber is making significant progress in this direction, paving the way for the next phase of AI development.
Known as Paired Open-Ended Trailblazer (POET), this platform serves as a training ground for AI bots.
Without human intervention, POET creates obstacle courses for virtual bots to navigate, evaluate their performance, and assign new tasks autonomously.
The researchers at Uber Labs believe that POET has the potential to bring about a paradigm shift in AI training, and it could even assume the responsibility of training AIs themselves.
We are very proud that @Uber AI Labs' Rui Wang, Joel Lehman, Jeff Clune, and Ken Stanley won a Best Paper Award @GECCO2019 for POET, the ML algorithm released earlier this year!
More about POET: https://t.co/taI5W2Rzce
Read the paper: https://t.co/SuMo9KSqhu#ML #AI #DeepLearning— Uber Engineering (@UberEng) August 6, 2019
Providing context on why this might be the step forward, renowned AI researcher Jeff Clune emphasizes the need for humans to step aside and allow AI to take on the tasks at hand.
Related News
- ChatGPT’s New Text Features and Price Drop May Ramp Its Rate of Adoption to Lightspeed
- Alibaba Makes Progress with LLM Research Amid Global AI Push
- Nvidia Stock Skyrockets on AI Boom – About to Break $1 Trillion Market Cap
What's the Best Crypto to Buy Now?
- B2C Listed the Top Rated Cryptocurrencies for 2023
- Get Early Access to Presales & Private Sales
- KYC Verified & Audited, Public Teams
- Most Voted for Tokens on CoinSniper
- Upcoming Listings on Exchanges, NFT Drops