Google I/O: All the details on what's ahead for Gemini

Google I/O: Comprehensive insights into Gemini's future developments and plans.

May 16, 2024 - 12:51

May 21, 2024 - 11:08

Gemini

Gone are the days of simple search results and isolated productivity tools. Starting now, Gemini will integrate closely with Google Services and feature prominently in search results. This move aligns with Google's vision of AI-driven services, similar to OpenAI's GPT-4 unveiling. For Android users, AI integration will be extensive, with Gemini integrated deeply into Android 15, including a scaled-down version within the OS itself.

In a dynamic presentation, we witnessed how Gemini is set to revolutionize search and our interactions with the world across various devices.

Integration with Google services

In Google Search, Gemini will offer contextualized answers directly on the Search Engine Results Page.

In Google Photos, Gemini will understand detailed queries. For example, asking "What's my license plate?" will prompt Gemini to scan your photos and provide the relevant result.

Similarly, asking about your daughter's swimming progress, like when she started swimming or for a progression timeline, will prompt Gemini to recognize the context and generate a comprehensive 'memory'.

In Gmail, Gemini can summarize emails from a specific sender, including PDFs and attachments, almost instantly.

Additionally, if you receive a one-hour video from Google Meet, asking Gemini for a summary will result in a highlights reel.

Gmail's capabilities are further enhanced with Gemini. Instead of manually searching through emails, you can ask Gemini to navigate lengthy threads and answer contextual questions.

As Google CEO Sundar Pichai highlighted, "Multi-modal rapidly expands the question we can ask, and the answers we get back."

Gemini AI and video search

In a demonstration using the Gemini mobile app, a user films their bookcase, and Gemini promptly identifies all book titles and authors, even when titles are partially obscured.

Conversing with Gemini via a smartphone camera reveals its impressive capabilities. Questions such as "Tell me when you see something that makes a sound," "Where did you see my glasses?" and "What does this code do?" were answered instantly and accurately. If the performance matches the demo, users can expect a highly intelligent assistant on any device.

Gemini in business

Google Chip is expected to be widely adopted by companies using Google Enterprise. Integrated as a virtual team member across services like Hangouts, it provides comprehensive oversight on projects.

Demonstrations showcased Chip's ability to report on key decisions and their approval status, as well as generate summaries of blockers before a project launch. This functionality has the potential to save significant time, as Google stated, "saving hours or dozens of hours of a person’s time."

Enhancing Gemini inputs

A key announcement at the event was Google's plan to upgrade Gemini with a 2 million token context window. This improvement in Gemini 1.5 Pro will allow it to process twice the number of tokens compared to the previous version, enabling users to analyze larger documents and input media. This enhancement makes Gemini 1.5 Pro the model with the largest context window, surpassing GPT-4's support for context lengths of up to 32,000 tokens and GPT-4 Turbo's extension to 128,000 tokens.

Improving image creation with Imagen 3:

Google also introduced Imagen 3, its latest text-to-image model, which will power the image creation tool ImageFX. This model competes with OpenAI’s DALL-E 3 and offers several enhancements over Imagen 2, including better detail, richer lighting, fewer distracting artifacts, and improved text rendering. Imagen 3 will initially be available to select users as a private preview via ImageFX and will later be integrated into other Google products such as the Gemini App, Google Workspace, and Google Ads.

Veo's introduction and competition with Sora

Another major announcement was the launch of Veo, a generative AI-powered video generation tool capable of creating 1080p videos up to a minute in length. Veo can produce high-quality videos in various compositional styles and will be initially offered to select users via VideoFX. Google also plans to introduce Veo to YouTube Shorts, positioning it as a direct competitor to OpenAI's Sora and providing a platform for delivering AI-generated video content to a wider audience.

Introduction of the Trillium chip

Google unveiled its new data center chip, Trillium, which it claims is five times faster than previous versions. With the demand for artificial intelligence and machine learning (ML) processing power increasing exponentially, Trillium offers a custom chip and data center that can compete with Nvidia, which currently dominates 80% of the market. Despite the competitive landscape, Google acknowledged Nvidia's strengths, indicating a non-hostile approach to competition.

Wrap-Up

The Google I/O presentation offered a wide array of use cases beyond what is summarized here. For example, returning a shopping item can be as simple as pointing your phone at it, as Google can locate the product, find the receipt, contact the supplier, and arrange a pickup date—all autonomously. These services will be launched seamlessly over the next few weeks to months, requiring no effort from end users. While OpenAI has made significant advancements in the tech world, Google's existing market base gives it a strong position for the future. The next six months promise to be intriguing as these developments unfold.