Business

Inside Apple’s AI architecture: Custom Gemini, sparse models and divergence

June 9, 2026

The Worldwide Developer Conference (WWDC) keynote for 2026 is pivotal for Apple in more ways than one. Through the keynote, the English idiom “better late than never” kept floating into thought. An expression of relief when an expected accomplishment finally occurs after some time has passed. Apple’s refreshed artificial intelligence (AI) strategy finds the tech giant competing a jigsaw that has been awaiting this moment for a while. The new Apple Intelligence suite, of which the new Siri AI is a big part, requires deeper understanding. Apple has not wavered from its long-standing data privacy and security promise, even with the infusion of Google’s Gemini models.

The new Siri AI in action, and (right) an illustration of the AFM 3 Core Advanced model architecture. (Official image)

One of the biggest questions emerging from the post keynote intrigue—how exactly does the architecture shape up, particularly with Google’s Gemini models? Immediately after the keynote, HT was invited to a technical deep dive with Craig Federighi, Apple’s senior vice president of Software Engineering, who was joined by Amar Subramanya who is vice president of AI at Apple, Siri chief Mike Rockwell, and Sebastien Marineau-Mes who is vice president of software. With Apple Intelligence at the core of Apple’s newest operating systems, understanding those contours becomes crucial.

Also read: Tim Cook’s final WWDC keynote sets Apple on a new AI course

What models are in play?

Apple’s AI structure is built atop the company’s own Apple Foundation models. This is the third generation for these models, with the first generation from 2024 and the second generation arriving last year. There are two on-device models—the AFM 3 Core which has a 3-billion parameter dense model, and the AFM 3 Core Advanced which is a 20-billion parameter natively multimodal model that uses a sparse architecture to activate anywhere between 1-4 billion parameters at a time depending on the request type.

There are three server-based models as well—the AFM 3 Cloud which Apple calls a server-side workhorse, optimized for speed, efficiency, and performance, the AFM 3 Cloud Pro for demanding use cases like agentic tool use and complex reasoning, and the ADM 3 Cloud (Image) for image generation and editing, which also unlocks advanced photo-editing tools in the iPhone. Amar Subramanya explains that these models represent a significant generational leap as far as quality of output and overall capabilities are concerned.

For the sparse architecture in particular, Subramanya calls the idea “pretty intuitive.” The way this works is that instead of enabling all parameters in a model as would otherwise happen, a sparse model uses a subset of parameters each time a request is sent to it.

“This is super powerful because you can build a big model and use only a subset or a slice of it each time a request is sent to the model. And this is the reason that this architecture has become the designer choice for all of the frontier models today,” he explains. He explains Apple built this sparse model from scratch, to avoid the costs of having to swap parameters which would increase token usage—and on-device, that would have constraints from memory to higher battery consumption.

Also read: Will artificial intelligence soon escape human control?

Specially for the AFM 3 Core Advanced, Subramanya explains that unlike typical server side models, this model looks at an entire query or request and chooses the right set of parameters. “So you’re not having to reload parameters with every token, and this dramatically cuts down the cost of loading these parameters,” he says.

But, what is Google’s role in this?

The two on-device models and the three server-side models in particular are where Gemini plays a role. “These models are specifically designed for Apple Intelligence experiences,” explains Federighi. Apple neither uses any of the Gemini models that Google deploys for its customers, nor does Apple use the infrastructure which is used by Google to deploy models for their customers. These are models that are custom made for Apple, by Google.

They will play a broader role, depending on query, in the broader Apple Intelligence suite which also includes intelligent photo editing tools, personal context understanding, updated Writing Tools, and Apple Intelligence in Home. This gives Apple much more headroom for visual intelligence across platforms and the smart home ecosystem which they are expected to scale rapidly in the coming years.

How is Apple Intelligence different from a chatbot?

Federighi explained a traditional AI chatbot architecture sees a user interact with an tool such as OpenAI’s ChatGPT and Anthropic’s Claude on a phone either in its app form or via a web browser, which then sends the query to the cloud. It is, as Federighi calls it, “a set of large language models, running in someone’s server infrastructure.” If there is an element of web searches required for that query, that also happens after a large language model is queried. If we are to take Google’s example, it could be a pick from options including Gemini 3.1 Flash-Lite, Gemini 3.1 Pro, Gemini 3.5 Flash or the Nano Banana 2.

“When it comes to our system, well, we use none of those things,” quipped Federighi. He explains that none of that methodology is part of iOS.

Apple Intelligence finds its foundation in the in-house Apple Foundation models, which are now in their third generation, that compute on device as well as online, depending on task. Reasoning, visual understanding and generation are some key elements. Federighi explains that the new Siri AI app for instance, isn’t reaching out to the same models in the cloud. At the core of Apple’s structure for AI is the baseline system experience which links to apps on the device, including the Siri AI app. This system now invokes a system orchestrator, which is key to the privacy architecture.

“It’s what coordinates requests against things like the toolbox, that provides access to actions within your apps, the spotlight semantic index, to access personal content to help fulfil your request, and even things like onscreen context, to understand what you might be looking at at the moment you’re making a request,” Federighi explains.

Then come a powerful set of on-device models which can understand text, speech, as well as the on-screen context. Depending on queries which the orchestrator may feel require a greater level of intelligence, it proceeds to contact Apple Foundation models on the Private Cloud Compute.

Mike Rockwell details how the Siri AI we see now, the new Siri in a way, has been built from the ground up. The new Apple Foundation models provided a strong base. “It allowed us to build a profoundly more capable Siri,” he says.

The privacy question: who sees my data?

The premise of Private Cloud Compute, something Federighi had explained to us in detail a couple of years ago, is to extend the privacy architecture from on-device on an iPhone, to the cloud as well. No requests or any accompanying data are stored, and they are never accessible to anyone including Apple. “All of those properties are something that’s not only built architecturally deeply into the system, but also something that third party researchers can continuously verify,” they say.

“One of the most powerful features of the new Siri that we’re incredibly excited about is the ability to use your personal context. And so, like never before, you can ask about the information on your device, you can then take action on that. And we’ve done it in a way that is just trivially easy for folks to access,” explains Rockwell.

“Other folks have talked about personal contacts, but often that comes with a lot of setup or, in particular, some significant privacy compromises. Your personal data is going to servers. In our case, we took great care about how we did this, so with a combination of Private Cloud Computeand the on-device models, we were able to deliver paid, fantastic experience,” he adds.

Does any of the data go to Google?

The simple answer is no. “Apple is in control of what software gets deployed to these notes. So we, and only we, can deploy software to these nodes that are running in Google’s Cloud,” explains Sebastien Marineau-Mes, before adding, “Apple devices themselves are only able or only allowed to talk to the software that’s been signed by Apple. Even though that software is running in first-party cloud, Apple devices will only talk to authentic Apple code running in private cloud compute. And so I think it makes for a very, very strong solution.”

Apple said they don’t need a chatbot app…?

It was last year when Federighi and Greg “Joz” Joswiak, Apple’s senior vice president of Worldwide Marketing had made it clear that Apple didn’t see the need for a chatbot app. It in a way suggested that Apple wouldn’t make one either. One could perceive the new Siri AI app as one, but executives see it differently.

“We see Siri not as a separate chatbot, an unintegrated place you go and chit chat, but rather as an integral, conversational tool, that you use in the moment. It is deeply integrated into your experience, understanding what’s on screen—not in some separate world but directly in a document that you’re editing and want help proofreading. While these experiences are conversational, they are really an extension of your system experience deeply integrated into your flow,” explains Federighi.

Inside Apple’s AI architecture: Custom Gemini, sparse models and divergence

What models are in play?

But, what is Google’s role in this?

How is Apple Intelligence different from a chatbot?

The privacy question: who sees my data?

Does any of the data go to Google?

Apple said they don’t need a chatbot app…?

LEAVE A REPLY Cancel reply

Follow us

Company

Latest news

Parents are back in control as Apple bets on expanded Child Safety suite

India’s exports rise 15% in April-May despite global uncertainties

SpaceX IPO this week: From listing date to valuation, 10 things to know before Elon Musk’s firm goes public

Popular news

Jackky Bhagnani holds his ears as Rakul Preet Singh brings up ‘situationship’ remark: ‘Gen Z banne ki zarurat nahi hain’

Why Michael Jackson biopic does not include sexual abuse allegations: Estate's $200 million 'error' explained

Olivia Rodrigo launches new song at invite-only LA show