18 Dec 2023

Google launches Gemini AI - "its largest and most capable yet"

Google has announced the launch of Gemini, an AI platform said to be its largest and most capable yet, which it believes could rival Chat GPT.

According to the tech giant, this is a significant milestone in the development of AI, and the beginning of a new era for Google as it continues to rapidly innovate and responsibly advance the capabilities of its models.

Gemini is the result of large-scale collaborative efforts by teams across Google, built from the ground up to be multimodal, which means “it can generalise and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video”, the company claims.

Google says that Gemini is also its most flexible model yet — able to efficiently run on everything from data centres to mobile devices. Its state-of-the-art capabilities will significantly enhance the way developers and enterprise customers build and scale with AI.

Next-generation capabilities

Until now, the standard approach to creating multimodal models involved training separate components for different modalities and then stitching them together to roughly mimic some of this functionality, Google says. These models can sometimes be good at performing certain tasks, like describing images, but struggle with more conceptual and complex reasoning.

The tech giant asserts that Gemini is different as it was designed to be “natively multimodal, pre-trained from the start on different modalities”, then fine-tuned it with additional multimodal data to further refine its effectiveness. This helps Gemini seamlessly understand and reason about all kinds of inputs from the ground up, far better than existing multimodal models — and its capabilities are state of the art in nearly every domain.


State-of-the-art performance

Google says its rigorously tested its models and that Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development.

Just as one example, with a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.

Sophisticated reasoning

Gemini 1.0’s sophisticated multimodal reasoning capabilities can help make sense of complex written and visual information, Google claims. This is said to make it uniquely skilled at uncovering knowledge that can be difficult to discern amid vast amounts of data.

Understanding text, images, audio and more

Gemini 1.0 was trained to recognise and understand text, images, audio and more at the same time, so it is said to better understand nuanced information and can answer questions relating to complicated topics. This, Google claims, makes it especially good at explaining reasoning in complex subjects like math and physics.

Gemini Ultra coming soon

For Gemini Ultra, Google is currently completing extensive trust and safety checks, including red-teaming by trusted external parties, and further refining the model using fine-tuning and reinforcement learning from human feedback (RLHF) before making it broadly available.

As part of this process, Google will make Gemini Ultra available to select customers, developers, partners and safety and responsibility experts for early experimentation and feedback before rolling it out to developers and enterprise customers early next year.

Early next year, the firm will also launch Bard Advanced, a new, innovative AI experience that gives users access to its best models and capabilities, starting with Gemini Ultra.

Google and Alphabet CEO Sundar Pichai, commented: “Every technology shift is an opportunity to advance scientific discovery, accelerate human progress, and improve lives. I believe the transition we are seeing right now with AI will be the most profound in our lifetimes, far bigger than the shift to mobile or to the web before it. AI has the potential to create opportunities — from the everyday to the extraordinary — for people everywhere. It will bring new waves of innovation and economic progress and drive knowledge, learning, creativity and productivity on a scale we haven’t seen before." He emphasised, “That’s what excites me: the chance to make AI helpful for everyone, everywhere in the world.”

Demis Hassabis, CEO and Co-Founder of Google DeepMind, on behalf of the Gemini team, added: “AI has been the focus of my life's work, as for many of my research colleagues. Ever since programming AI for computer games as a teenager, and throughout my years as a neuroscience researcher trying to understand the workings of the brain, I’ve always believed that if we could build smarter machines, we could harness them to benefit humanity in incredible ways.

“This promise of a world responsibly empowered by AI continues to drive our work at Google DeepMind. For a long time, we’ve wanted to build a new generation of AI models, inspired by the way people understand and interact with the world. AI that feels less like a smart piece of software and more like something useful and intuitive — an expert helper or assistant.

“We’ve made great progress on Gemini so far and we’re working hard to further extend its capabilities for future versions, including advances in planning and memory, and increasing the context window for processing even more information to give better responses.

“We’re excited by the amazing possibilities of a world responsibly empowered by AI — a future of innovation that will enhance creativity, extend knowledge, advance science and transform the way billions of people live and work around the world.”

For more AI news, click here

View all News