Tech
Brain in a Bubble
Brain in a Bubble
illusion of thinking

With WWDC on deck, Apple says “reasoning” AI models collapse with complexity

Apple tested state-of-the-art “chain of thought” models and found that they aren’t “reasoning,” but merely pattern matching, calling into question the direction the industry is taking.

Jon Keegan
6/9/25 10:34AM

Apple’s troubled AI rollout was plagued by a series of remarkable feature failures and product delays.

What was supposed to be the year of “Apple Intelligence” has failed to deliver an AI-enhanced Siri on par with voice assistants from competitors like Google, OpenAI, and Meta. This week, all eyes are on Apple as it holds its Worldwide Developers Conference (WWDC) to see what it’s planning to get back in the AI race.

But behind the scenes, researchers at Apple have been digging into the competition’s latest and greatest “reasoning” models to see how they respond to tricky challenges as they scale in complexity.

In a new paper, Apple’s researchers found that the leading state-of-the-art “chain of thought” models “face a complete accuracy collapse” when they dialed up the complexity of puzzle-based tests. The spectacular failures of the models led the researchers to question their “reasoning” label, calling it instead “the illusion of thinking.”

The suite of tests included puzzles like “Tower of Hanoi,” in which the player must stack a series of disks of various sizes from one post to another, one disk at a time, only moving the top disk, and always placing smaller disks on larger ones.

Screenshot from apple “Illusion of Thinking” paper
A figure from the “Illusion of Thinking” Apple paper showing models’ collapse in accuracy as the complexity is dialed up. (Source: Apple)

While the models could solve the simplest versions of the puzzles, they fell on their face once things got more complex. The research tested reasoning models DeepSeek-R1, OpenAI’s o3-mini, and Anthropic’s Claude 3.7 Sonnet Thinking.

Chain of “thought”

After hitting performance plateaus from the “more data, more compute” approach, the industry followed OpenAI’s o1 release and started to build “chain of thought” reasoning models, which showed their “thought” processes.

This technique did boost the performance of large language models to new levels, offering a promising new pathway out of what looked to be a computational dead end. While they required vastly higher computation resources and time, the approach seemed to be the way forward.

Apple’s research seems to show that rather than reasoning, these models are merely displaying sophisticated pattern matching.

Apple researchers also examined the “thought” processes behind each solution to the puzzle, to better understand exactly how the models approached solutions.

The fact of the matter is that very little is known about how these recent models actually work. It remains to be seen if Apple has been cooking up an alternate approach, but reports indicate an AI-enhanced Siri isn’t likely to make a debut at this week’s WWDC.

More Tech

See all Tech
tech

Nebius soars after signing a five year deal with Microsoft to supply nearly $20 billion worth of AI computing power

Artificial intelligence infrastructure group Nebius jumped more than 50% in early trading on Tuesday after the company announced a major deal to supply computing power for Microsoft’s AI operations.

Under the agreement, Nebius will provide Microsoft “access to dedicated GPU infrastructure capacity in tranches at its new data center in Vineland, New Jersey over a five-year term.” The total contract value through 2031 is $17.4 billion, although, if further capacity is required, the contract value could rise to $19.4 billion.

The deal is a sizable portion of Microsoft's proposed annual capital expenditure on AI, which is expected to reach $120 billion by the end of fiscal 2026.

Under the agreement, Nebius will provide Microsoft “access to dedicated GPU infrastructure capacity in tranches at its new data center in Vineland, New Jersey over a five-year term.” The total contract value through 2031 is $17.4 billion, although, if further capacity is required, the contract value could rise to $19.4 billion.

The deal is a sizable portion of Microsoft's proposed annual capital expenditure on AI, which is expected to reach $120 billion by the end of fiscal 2026.

President Trump hosts tech executives and their guests to a dinner at the White House in the Oval Office.

Here are the Trump ties among the tech leaders who had dinner at the White House

Many of the attendees have donated to, vocally supported, or even worked for the president.

tech

Tesla’s EV market share declined to 38% in August

In August, Tesla’s share of the US EV market fell to 38%, according to new data from Cox Automotive reported by Reuters. Tesla’s market share fell below 50% for the first time last year, as competitors’ EVs began hitting the market. Now, as Tesla’s own sales slip more drastically than they had last year, it’s giving up even more ground. Tesla’s market share fell from 48.7% in June to 42% in July to 38% in August, according to Reuters. That slide has come even as buyers rushing to take advantage of the federal tax credit that ends this month provide a near-term boon for sales at Tesla and other EV makers.

$115B

OpenAI now expects to burn around $115 billion through 2029 — a full $80 billion higher than the company had previously estimated, The Information reports.

Just how much is that? It’s roughly equivalent to:

Fortunately for OpenAI, which is raising money at a $500 billion valuation, its revenue is also growing faster than expected. The ChatGPT maker now expects to make $13 billion in revenue this year and $200 billion in 2030.

An annotated photo of who attended the tech dinner at the White House.

An interactive who's-who of the tech execs at Trump's White House dinner

The White House invited a gaggle of top founders and tech executives for an intimate dinner at the White House.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.