Google has unveiled Gemini, its latest and most potent multimodal general AI model.
This advanced AI is now accessible worldwide through platforms like Bard, specific developer platforms, and the newly-released Google Pixel 8 Pro devices.
I experimented with Gemini Nano, available on Bard, exploring its content summarisation, image recognition, and voice-to-text interpretation features.
Seeing some qs on what Gemini *is* (beyond the zodiac :). Best way to understand Gemini’s underlying amazing capabilities is to see them in action, take a look ⬇️ pic.twitter.com/OiCZSsOnCc
— Sundar Pichai (@sundarpichai) December 6, 2023
Gemini is a large language model (LLM) developed by Google's DeepMind division, with the aim of competing with other AI systems like OpenAI's ChatGPT and potentially surpassing them.
Key features of Gemini
Gemini can process various types of information, including text, images, and more.
This enables it to engage in conversations and recognize real-time video content effectively.
It is part of Google's new generation of super-smart models that utilize Pathways, Google's innovative AI infrastructure. This suggests that Gemini may be one of the largest language models ever developed.
Categorised as one of the "next-generation multimodal models," Gemini is presumed to be among the most extensive language models created thus far.
Gemini is available in different versions, each with its unique strengths.
Some versions might utilize memory, perform fact-checking via Google Search, and continuously improve learning to enhance accuracy and safety over time.
We’re excited to announce 𝗚𝗲𝗺𝗶𝗻𝗶: @Google’s largest and most capable AI model.
— Google DeepMind (@GoogleDeepMind) December 6, 2023
Built to be natively multimodal, it can understand and operate across text, code, audio, image and video - and achieves state-of-the-art performance across many tasks. 🧵 https://t.co/mwHZTDTBuG pic.twitter.com/zfLlCGuzmV
I experimented with Gemini Nano, accessible on Bard, testing its text, image, and voice response functionalities.
Here's how it responded in each mode:
When prompted to summarise 'David Copperfield,' Bard provided a structured breakdown by chapters along with pertinent links to related topics.
The voice-to-text feature operates effectively. When a spoken question is recorded, it accurately appears as text in the search bar, promptly generating the desired search results.
When testing the image identification feature, it successfully recognized generic images. However, it faced challenges in identifying individuals depicted in the provided images.
When prompted to identify a picture of Prime Minister Narendra Modi, it responded with, 'I can't help with images of people yet,' indicating its inability to recognise individuals.
Even when requesting identification of images depicting landmarks like India Gate and the Colosseum in Rome, the AI appeared unfamiliar with identifying places accurately.
Plans are in place to enhance and extend these features, making them accessible across all devices.
Gemini comes in three sizes to match different needs.
Introducing Gemini 1.0, our most capable and general AI model yet. Built natively to be multimodal, it’s the first step in our Gemini-era of models. Gemini is optimized in three sizes - Ultra, Pro, and Nano
— Sundar Pichai (@sundarpichai) December 6, 2023
Gemini Ultra’s performance exceeds current state-of-the-art results on… pic.twitter.com/pzIw6iCPPN
Now Gemini Pro is coming today in Bard’s biggest update yet (in English in 170 countries) with more advanced reasoning and understanding in the responses. Bard Advanced with Ultra, our most general and capable model for highly complex tasks, is coming early next year.… pic.twitter.com/x6W90HJMJw
— Sundar Pichai (@sundarpichai) December 6, 2023
Gemini Nano is super efficient for tasks that are on-device. Android developers can sign up for an early access program for Gemini Nano via Android AICore and Pixel 8 Pro users can already see it rolling out in features like Summarize in Recorder and Smart Reply in Gboard + much… pic.twitter.com/KFIei4D9Pc
— Sundar Pichai (@sundarpichai) December 6, 2023
Currently, Gemini is accessible in English across over 170 countries and territories. It is slated for expansion into additional languages and regions, including Europe, in the near future.