Google on Wednesday unveiled its next-generation and multimodal AI (artificial intelligence) model, Gemini to take on its rival Microsoft-backed OpenAI’s GPT-4. The company calls Gemini the “most capable, flexible, and general AI model” that it has ever built.
Developed by a team of researchers from Google’s now-merged AI divisions DeepMind and Google Brain, this new large language model (LLM) can generalize and seamlessly understand, operate across, and combine different types of information, including text, code, audio, image, and video.
Gemini 1.0, the company’s first version, comes in three different sizes: Pro, Ultra, and Nano. While Gemini Ultra is made “for highly complex tasks”, Gemini Pro offers scaling “across a wide range of tasks” and Gemini Nano is the company’s most efficient model “for on-device tasks.”
“We’re taking the next step on our journey (as an AI first company) with Gemini, our most capable and general model yet, with state-of-the art performance across many leading benchmarks,” Google CEO Sundar Pichai said in a foreword to the blog post about the announcement.
“Our first version, Gemini 1.0, is optimized for different sizes: Ultra, Pro, and Nano. These are the first models of the Gemini era and the first realization of the vision we had when we formed Google DeepMind earlier this year.”
According to Google, Gemini Ultra’s performance outperformed current “state-of-the-art” models, including ChatGPT’s most powerful model, GPT-4, on 30 of the 32 widely-used academic benchmarks used in LLM research and development.
With a score of 90.0% on the huge multitask language understanding (MMLU), Gemini Ultra is the first model to outperform human experts (89.8%) as well as GPT-4 (86.4%), which uses a combination of 57 subjects such as math, physics, history, law, medicine, and ethics for testing both world knowledge and problem-solving abilities, the company added.
In addition, Gemini can “understand, explain and generate high-quality code in the world’s most popular programming languages, like Python, Java, C++, and Go. Its ability to work across languages and reason about complex information makes it one of the leading foundation models for coding in the world.”
Starting Wednesday, Gemini Pro is available for free right now to anyone with a Google account through the Google Bard service. It is available in English in more than 170 countries, including the U.S., with the services expected to expand to different modalities and support new languages and locations in the near future.
Further, Gemini Nano is now available on the Pixel 8 Pro smartphone and is likely to roll out to other Pixel models very soon. Lastly, the most powerful model, Ultra, which is being tested externally, is not expected to be released publicly before early 2024.