Google has officially showed off its hotly-anticipated AI model, Gemini — which it claims is more powerful than OpenAI’s GPT-4 large language model (LLM), saying it can rival “expert level” humans in an intelligence test.
The model, which CEO Sundar Pichai first unveiled during the company’s I/O conference in May, will come in three flavors: Nano, Pro, and Ultra, with Ultra being the most capable and the one that can allegedly edge out GPT-4.
A “fine-tuned version of Gemini Pro” has already been integrated into Google’s Bard chatbot, according to an official blog post. Ultra is still undergoing “extensive trust and safety checks” and “fine-tuning” but will eventually be built into Bard as well “early next year.”
But beyond some simple video demonstrations the company shared today, that’s more or less all we know about Google’s latest AI. We don’t know, for instance, how many parameters it has or what data it was trained on, making an accurate apples-to-apples comparison with competing AI models impossible.
That’s not to mention the glaring lack of a commonly agreed-upon, industry-standard benchmark to measure the intelligence of any AI model.
Basically, we’ve heard some marketing fluff — and now the company has a lot to prove.
Gemini is “natively multimodal,” which means it can “combine different types of information including text, code, audio, image and video,” according to the blog post.
In a series of videos, Google showed off Gemini’s capabilities. One video shows a program correctly identifying a drawing of a blue duck. A separate video shows the AI reading a student’s answers to math questions and explaining why the student was right or wrong.
But how does it really compare to OpenAI’s competing GPT-4?
In its announcement today, the company claimed its Gemini Ultra model scored 90 percent on an MMLU (massive multitask language understanding) test, which “uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.”
That makes it “the first model to outperform human experts” on the test. “Expert level” humans score only 89.8 percent.
GPT-4 only got 86.4 percent on the test, according to Google. However, the mid-range Gemini Pro model, which was integrated into Google’s Bard chatbot, notably was only able to beat out GPT-3.5 in the same test.
In other words, it’s still an extremely tight race, going by the data Google released today.
During a press briefing, Eli Collins, vice president of product at Google DeepMind, said “I suspect it does,” when asked if Gemini has any new capabilities compared to current-gen LLMs — a vague answer at best.
Collins also said that Gemini went through the “most comprehensive safety evaluations” of any AI models the company has built to date, which could explain why Google had to reportedly postpone its launch.
The company also said it won’t release the parameter count, which could also make comparisons difficult.
Beyond Gemini, Google has also shown off its “experimental” Search Generative Experience earlier this year, which is meant to enhance the company’s search engine results. But given what we’ve seen so far, the tool leaves a lot to be desired.
In short, the jury is still very much out on where Gemini stands compared to its fierce competition.
And that’s unlikely to change any time soon. As of today, Bard users will only be able to use text prompts, with image and audio interaction only being released “in coming months,” as Collins told reporters.
Even Gemini Ultra, which Google claims can outperform GPT-4, still isn’t ready for prime time, which means we’ll have to wait until we can draw any definitive conclusions.
More on Google and AI: Former Google CEO Warns AI Could Endanger Humanity Within Five Years