Apple’s On-Device AI System ReaLM Can Outperform GPT-4?

Reports are that Apple is adapting up to enter the AI field, with Siri set to get a noteworthy performance boost and new capabilities


Apple is all set for WWDC 2024 in June, where they’re expected to make some big AI announcements. While the exact details are still under wraps, one focus will be enhancing Siri, which has been a point of frustration for many iPhone users.

Recently, Apple’s AI researchers released a paper on a new system called Reference Resolution as Language Modeling (ReALM). This system aims to improve how AI understands context in conversations. If successful, ReALM could help Siri better understand what’s on your screen, remember details from past conversations, and even recognize background activities. This could mean exciting changes for Siri, possibly in time for WWDC.

Using ReaLM, understanding conversations, what’s on your screen, and even background activities become simpler. This method shifts away from traditional approaches, which rely heavily on the context of the conversation. Instead, it converts everything into text, making it easier for large language models (LLMs) to grasp the meaning.

The researchers compared ReaLM models to GPT-3.5 and GPT-4, which are currently used in ChatGPT and ChatGPT Plus. They found that their smallest model performed similarly to GPT-4, while their larger models performed even better.

“We demonstrate large improvements over an existing system with similar functionality across different types of references, with our smallest model obtaining absolute gains of over 5% for onscreen references,” the researchers explained in the paper. “We also benchmark against GPT-3.5 and GPT-4, with our smallest model achieving performance comparable to that of GPT-4, and our larger models substantially outperforming it.”

The paper describes four sizes of the ReALM model: ReALM-80M, ReALM-250M, ReALM-1B, and ReALM-3B. The “M” and “B” stand for millions and billions of parameters, respectively. For comparison, GPT-3.5 has 175 billion parameters, while GPT-4 is said to have around 1.5 trillion parameters.

“We show that ReaLM outperforms previous approaches, and performs roughly as well as the state-of-the-art LLM today, GPT-4, despite consisting of far fewer parameters,” the paper states.

While ReALM outperforms GPT-4 in this specific benchmark, it’s important to note that this doesn’t necessarily make it a superior model overall. ReALM simply excelled in a benchmark specifically designed to its strengths.

Although, it’s still unclear if Apple will use this research in iOS 18 or its newest devices.

Read more

Recommended For You