Google has taken the wraps off Gemini, its largest language model yet which achieves unprecedented performance on academic benchmarks. Gemini promises a generational leap in capabilities underpinning enhancements across Google’s product portfolio from Bard to Pixel phones.
The Chief Executive Officer of Alphabet (GOOG), Sundar Pichai, dubbed Gemini’s launch “one of the biggest science and engineering efforts we’ve undertaken as a company” while Google’s top AI executive, Demis Hassabis, touted its potential to “benefit humanity in incredible ways”.
Gemini Comes in Three Variants to Accommodate Multiple Hardware Capabilities
Gemini has been optimized into three size configurations called Ultra, Pro, and Nano. Gemini Ultra operates as the most capable and complex variant for tackling highly advanced tasks.
At the other end, Gemini Nano emphasizes computational efficiency for low-latency deployment onto smartphones and edge devices. Gemini Pro strikes a balance with robust versatility but modest resource demands during the inference stage.
This multi-pronged approach enables users to easily adapt Gemini for different emerging applications without retraining separate models each time. The alignment between cloud and on-device offerings also can help streamline product development using a unified AI substrate.
Gemini Achieves Unprecedented Milestones in Various Tests
Extensively evaluating Gemini 1.0 on 32 major academic benchmarks, Google Research logged state-of-the-art results on 30 – a feat revealing broad and cross-disciplinary strengths.
The tests spanned diverse categories that evaluated linguistic comprehension, mathematical reasoning, visual recognition, logical deduction, and other facets of intelligence. Outperforming prior benchmarks demonstrates Gemini’s powerful capabilities to tackle most tasks that are thrown its way.
One impressive achievement is that Gemini Ultra has successfully tackled MMLU, which means massive multitask language understanding—a challenge that has puzzled all AI systems so far with a knowledge test for human experts on multiple subjects. The Ultra model achieved an excellent score of 59.4% on MMLU tests covering 57 topics, demonstrating its ability to reach a higher level of overall understanding that increasingly resembles complex human intelligence.
“Nearly eight years into our journey as an AI-first company, the pace of progress is only accelerating: Millions of people are now using generative AI across our products to do things they couldn’t even a year ago”, Pichai commented in the official press release published by Google regarding the launch.
Google Aims to Find a Balance Between Power and Safety
Purely textual benchmarks only partially illuminate Gemini’s capacities. Driving multiple technology breakthroughs is native support for images, videos, speech, code, and other data types.
Rather than crudely stitching together disjoint toolsets as usual, Gemini interweaves understanding into a coherent model. This consolidation enables fluid interplay between visual elements, words, and sounds when tackling real-world ambiguity.
Pichai stressed how effectively handling diverse inputs unlocks fresh utility – a key prerequisite for the assistive, universally helpful artificial intelligence interfaces he envisions ahead. Workloads spanning medicine, science, education, and creative arts stand to gain enormously from Gemini’s versatile digitization abilities.
Meanwhile, while seemingly outpacing its predecessors on raw aptitude, responsible development remains Google’s utmost priority with Gemini given its associated risks. To this end, engineers built a tailored model audit framework to proactively address unique safety considerations.
As one example, countering toxicity now covers multimodal threats like violent imagery or harmful video contexts, which were previously overlooked. Pichai asserted that building models that are both powerful and safe was the goal behind the company’s architectural choices for the product.
“We’re approaching this work boldly and responsibly. That means being ambitious in our research and pursuing the capabilities that will bring enormous benefits to people and society, while building in safeguards and working collaboratively with governments and experts to address risks as AI becomes more capable”, the tech executive highlighted.
Gemini Will Be Integrated Across All of Google’s Product Stack and Pixel Phones Soon
With Bard being the first high-profile application where the model will be launched and tested, Google aims to deploy Gemini across its ecosystem during what remains of the year and throughout 2024.
This includes a full-blown rollout to its advertising, cloud services, and productivity software like Docs, and consumer offerings such as Maps, Translate, and Lens.
Moreover, Gemini is capable of assisting programmers at the time, having been trained to analyze and propose code for languages including Python, Java, and C++. An iteration of the firm’s specialized coding tool, now called AlphaCode 2, reportedly exceeded the achievements of its predecessor as it was able to solve 85% of the tests it was put through, which included not just solving relatively basic coding issues but also tackling complex math and computer science problems.
Comparatively, AlphaCode’s original version was only able to solve 50% of those same riddles.
Gemini Pro will be rolled out starting today and will come to improve the overall user experience of Bard and will be available in more than 170 countries – initially only in English – but will soon be launched in other languages in the near future.
Moreover, Google plans to further deploy Gemini into its Pixel phones, starting with the Pixel 8 Pro, which will integrate Gemini Nano to power various features to create summaries and intuitive message responses.
“These are the first models of the Gemini era and the first realization of the vision we had when we formed Google DeepMind earlier this year”, Pichai emphasized regarding the launch.
He concluded: “This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company. I’m genuinely excited for what’s ahead, and for the opportunities Gemini will unlock for people everywhere.”