This week, Google took the wraps off of Gemini, its new flagship generative AI model meant to power a range of products and services including Bard, Google’s ChatGPT competitor. In blog posts and press materials, Google touted Gemini’s superior architecture and capabilities, claiming that the model meets or exceeds the performance of other leading gen AI models like OpenAI’s GPT-4.
But the anecdotal evidence suggests otherwise.
A “lite” version of Gemini, Gemini Pro, began rolling out to Bard yesterday, and it didn’t take long before users began voicing their frustrations with it on X (formerly Twitter).
The model fails to get basic facts right, like 2023 Oscar winners:
Note that Gemini Pro claims incorrectly that Brendan Gleeson won Best Actor last year, not Brendan Fraser — the actual winner.
I tried asking the model the same question and, bizarrely, it gave a different wrong answer:
“Navalny,” not “All the Beauty and the Bloodshed,” won Best Documentary Feature last year; “All Quiet on the Western Front” won Best International Film; “Women Talking” won Best Adapted Screenplay; and “Pinocchio” won Best Animated Feature Film. That’s a lot of mistakes.
Science fiction author Charlie Stross found many more examples of confabulation in a recent blog post. (Among other mistruths, Gemini Pro said that Stross contributed to the Linux kernel; he never has.)
Translation doesn’t appear to be Gemini Pro’s strong suit, either. It struggles to give a six-letter word in French:
When I ran the same prompt through Bard (“Can you give me a 6-letters word in French?”), Gemini Pro responded with a seven-letter word instead of a five-letter one — which gives some credence to the reports about Gemini’s poor multilingual performance.
What about summarizing news? Surely Gemini Pro, with Google Search and Google News at its disposal, can give a recap of something topical? Not necessarily.
It seems Gemini Pro is loath to comment on potentially controversial news topics, instead telling users to… Google it themselves.
I tried the same prompt and got a very similar response. ChatGPT, by contrast, gives a bullet-list summary with citations to news articles:
Interestingly, Gemini Pro did provide a summary of updates on the war in Ukraine when I asked it for one. However, the information was over a month out of date:
Google emphasized Gemini’s enhanced coding skills in a briefing earlier this week. Perhaps it’s genuinely improved in some areas — posts on X suggest as much. But it also appears that Gemini Pro struggles with basic coding functions like this one in Python:
And these:
And, as with all generative AI models, Gemini Pro isn’t immune to “jailbreaks” — i.e. prompts that get around the safety filters in place to attempt to prevent it from discussing controversial topics.
Using an automated method to algorithmically change the context of prompts until Gemini Pro’s guardrails failed, AI security researchers at Robust Intelligence, a startup selling model-auditing tools, managed to get Gemini Pro to suggest ways to steal from a charity and assassinate a high-profile individual (albeit with “nanobots” — admittedly not the most realistic weapon of choice).
Now, Gemini Pro isn’t the most capable version of Gemini — that model, Gemini Ultra, is set to launch sometime next year in Bard and other products. Google compared the performance of Gemini Pro to GPT-4’s predecessor, GPT-3.5, a model that’s around a year old.
But Google nevertheless promised improvements in reasoning, planning and understanding with Gemini Pro over the previous model powering Bard, claiming Gemini Pro was better at summarizing content, brainstorming and writing. Clearly, it has some work to do in those departments.