Alphabet’s Gemini AI model has been public for only two months, but the company is already releasing an upgrade. Gemini Pro 1.5, launching with limited availability today, is more powerful than its predecessor and can handle huge amounts of text, video, or audio input at a time.
Demis Hassabis, CEO of Google DeepMind, which developed the new model, compares its vast capacity for input to a person’s working memory, something he explored years ago as a neuroscientist. “The great thing about these core capabilities is that they unlock sort of ancillary things that the model can do,” he says.
In a demo, Google DeepMind showed Gemini Pro 1.5 analyzing a 402-page PDF of the Apollo 11 communications transcript. The model was asked to find humorous portions and highlighted several moments, like when astronauts said that a communications delay was due to a sandwich break. Another demo showed the model answering questions about specific actions in a Buster Keaton movie. The previous version of Gemini could have answered these questions only for much shorter amounts of text or video. Google hopes that the new capabilities will allow developers to build new kinds of apps on top of the model.
“It really feels quite magical how the model performs this sort of reasoning across every single page, every single word,” says Oriol Vinyals, a research scientist at Google DeepMind.
Google says Gemini Pro 1.5 can ingest and make sense of an hour of video, 11 hours of audio, 700,000 words, or 30,000 lines of code at once—several times more than other AI models, including OpenAI’s GPT-4, which powers ChatGPT. The company has not disclosed the technical details behind this feat. Hassabis says that one use for models that can handle large amounts of text, tested by researchers at Google DeepMind, is identifying the important takeaways in Discord discussions with thousands of messages.
Gemini Pro 1.5 is also more capable—at least for its size—as measured by the model’s score on several popular benchmarks. The new model exploits a technique previously invented by Google researchers to squeeze out more performance without requiring more computing power. The technique, called mixture of experts, selectively activates parts of a model’s architecture that are best suited to solving a given task, making it more efficient to train and run.
Google says that Gemini Pro 1.5 is as capable as its most powerful offering, Gemini Ultra, in many tasks, despite being a significantly smaller model. Hassabis says there is no reason why the same technique used to improve Gemini Pro cannot be applied to boost Gemini Ultra.
The upgraded version of Gemini Pro will be made available to developers through AI Studio, a sandbox for testing model capabilities, and to a limited number of developers though Google’s Vertex AI cloud platform API. There’s no date yet for a general release.
Google is also launching new tools to help developers use Gemini in their applications, including new ways of tapping into the models’ ability to parse video and audio. The company also said it is adding new Gemini-powered features to its web-based coding tool, Project IDX, including ways for AI to debug and test code.
The speed of Gemini’s upgrade is a sign of a furious AI race kicked off by the success of ChatGPT. Earlier this week, OpenAI announced that it is giving ChatGPT the ability to remember useful information from conversations over long periods of time. Last week, Google rebranded its chatbot Bard and announced that Gemini Ultra would be available with a paid subscription.
The frenetic pace of progress in generative AI is at odds with worries about the risks the technology might pose. Google says it has put Gemini Pro 1.5 through extensive testing and that providing limited access offers a way to gather feedback on potential risks. The company says it has also provided researchers at the UK’s AI Safety Institute with access to its most powerful models so that they can test them.
Hassabis says to expect more advances in the months to come. “This is a new cadence,” he says, “I’m trying to bring from a sort of startup mentality.”