Google building Gemini to be a proactive, personal universal AI assistant

0
9
Google building Gemini to be a proactive, personal universal AI assistant


Mountain View, California: Google insists that a substantial artificial intelligence (AI) layer will quickly find relevance and depth across Search, shopping, Workspace, filmmaking and video communications platforms. That’s crucial to their vision for a universal AI assistant, detailed at the annual Google I/O conference. This, as its competition including OpenAI, Anthropic, and Microsoft too have made significant progress with their AI tools.

Two Google projects contribute significantly to Gemini’s planned transformation. (HT photo)
Two Google projects contribute significantly to Gemini’s planned transformation. (HT photo)

“More intelligence is available, for everyone, everywhere. And the world is responding, adopting AI faster than ever before…What all this progress means is that we’re in a new phase of the AI platform shift. Where decades of research are now becoming a reality for people, businesses and communities all over the world,” said Sundar Pichai, CEO, Google and Alphabet.

Pichai cited an example of Project Starline, a 3D video streaming technology from a few years ago, as an underlying tech for the new Google Beam AI video communications platform that rolls out later this year on HP’s computing devices. One of its claimed party pieces — head movement tracking, to the millimetre.

AI agents prove to be a continuing theme, something OpenAI, IBM, Anthropic and Microsoft recently, too, have made a case for. 

“Our recent updates to Gemini are critical steps towards unlocking our vision for a universal AI assistant, one that’s helpful in your everyday life, that’s intelligent and understands the context you’re in, and that can plan and take actions on your behalf across any device. This is our ultimate goal for the Gemini app, an AI that’s personal, proactive and powerful,” noted Demis Hassabis, CEO of Google DeepMind, in a session of which HT was a part.

For Google, AI agents will be the result of a multi-pronged approach, one that sees Gemini 2.5 model imbibe enhanced reasoning, Gemini app adding Canvas for creative coding or creating podcasts, as well as the new video generation model Veo 3 and image generator Imagen 4, within the app.

Two Google projects contribute significantly to Gemini’s planned transformation. 

This builds on Project Astra, to give AI situational context, such as video understanding, screen sharing and memory. Google said Gemini, and that also includes its apps for Android and iOS, has crossed 400 million monthly active users and 7 million developers worldwide are building apps with these models. 

This will also be a culmination of Project Mariner, which, as Hassabis explained, “explores the future of human-agent interaction, starting with browsers”. This now includes a system of agents that can complete up to ten different tasks at a time. Hassabis said these tasks can include looking up information, making bookings, buying things, and researching a topic, in parallel.

Also, Gemini Live, with camera and screen sharing, is now available for all users on the free tier, on Android devices as well as the Apple iPhone. “In the coming weeks, Gemini Live will integrate more deeply into your daily life. Planning a night out with friends? Discuss the details in Gemini Live, and it instantly creates an event in your Google Calendar,” explained Hassabis, detailing integration plans for Google Maps, Tasks and Keep too.

Google estimated that its rival OpenAI’s ChatGPT had roughly 600 million monthly users in March. Meta’s Mark Zuckerberg claimed in September that Meta AI was then nearing 500 million monthly users. 

Incoming improvements for Gemini 2.5 Pro, add new reasoning capabilities with Deep Think mode. Its specific focus on complex math and coding tasks, will be relevant for Gemini’s march towards an ‘agentic AI’ vision. This focus on sophisticated reasoning aligns with a wider industry trend towards AI that can not only generate content but also perform complex problem-solving — OpenAI’s o1, Anthropic’s Claude and xAI’s Grok 3 are examples.

“Since incorporating LearnLM, our family of models built with educational experts, 2.5 Pro is also now the leading model for learning. In head-to-head comparisons evaluating its pedagogy and effectiveness, educators and experts preferred Gemini 2.5 Pro over other models across a diverse range of scenarios,” said Koray Kavukcuoglu, CTO of Google, DeepMind.

The lighter Gemini 2.5 Flash receives improved reasoning, multimodality, code and long context. For now, the updated 2.5 Flash is available as ‘experimental’ in Google AI Studio for developers, in Vertex AI for enterprises, and the Gemini app for everyone — its final release is pegged for early June.

Playing a crucial part in Google’s universal AI assistant development, is the company’s Search platform. An AI Mode in search, starting with users in the US, utilises Gemini’s frontier capabilities for advanced reasoning and multimodality. Liz Reid, who is VP, Head of Google Search, explained that AI Mode will use query fan-out technique, to break down any question asked by a user, into further subtopics. “This enables Search to dive deeper into the web than a traditional search on Google,” said Reid.


LEAVE A REPLY

Please enter your comment!
Please enter your name here