Brain in a Bubble

illusion of thinking

With WWDC on deck, Apple says “reasoning” AI models collapse with complexity

Apple tested state-of-the-art “chain of thought” models and found that they aren’t “reasoning,” but merely pattern matching, calling into question the direction the industry is taking.

Jon Keegan

6/9/25 10:34AM

Apple’s troubled AI rollout was plagued by a series of remarkable feature failures and product delays.

What was supposed to be the year of “Apple Intelligence” has failed to deliver an AI-enhanced Siri on par with voice assistants from competitors like Google, OpenAI, and Meta. This week, all eyes are on Apple as it holds its Worldwide Developers Conference (WWDC) to see what it’s planning to get back in the AI race.

But behind the scenes, researchers at Apple have been digging into the competition’s latest and greatest “reasoning” models to see how they respond to tricky challenges as they scale in complexity.

In a new paper, Apple’s researchers found that the leading state-of-the-art “chain of thought” models “face a complete accuracy collapse” when they dialed up the complexity of puzzle-based tests. The spectacular failures of the models led the researchers to question their “reasoning” label, calling it instead “the illusion of thinking.”

The suite of tests included puzzles like “Tower of Hanoi,” in which the player must stack a series of disks of various sizes from one post to another, one disk at a time, only moving the top disk, and always placing smaller disks on larger ones.

Screenshot from apple “Illusion of Thinking” paper — A figure from the “Illusion of Thinking” Apple paper showing models’ collapse in accuracy as the complexity is dialed up. (Source: Apple)

While the models could solve the simplest versions of the puzzles, they fell on their face once things got more complex. The research tested reasoning models DeepSeek-R1, OpenAI’s o3-mini, and Anthropic’s Claude 3.7 Sonnet Thinking.

Chain of “thought”

After hitting performance plateaus from the “more data, more compute” approach, the industry followed OpenAI’s o1 release and started to build “chain of thought” reasoning models, which showed their “thought” processes.

This technique did boost the performance of large language models to new levels, offering a promising new pathway out of what looked to be a computational dead end. While they required vastly higher computation resources and time, the approach seemed to be the way forward.

Apple’s research seems to show that rather than reasoning, these models are merely displaying sophisticated pattern matching.

Apple researchers also examined the “thought” processes behind each solution to the puzzle, to better understand exactly how the models approached solutions.

The fact of the matter is that very little is known about how these recent models actually work. It remains to be seen if Apple has been cooking up an alternate approach, but reports indicate an AI-enhanced Siri isn’t likely to make a debut at this week’s WWDC.

Rani Molla & Jon Keegan18h

Amazon closes at all-time high

Fresh off strong earnings Thursday, Amazon saw its stock price end the week at a record closing high of $244.22.

The stock is up 10% so far this year.

The e-commerce and cloud giant beat analysts’ revenue and earnings, and its massive gain was responsible for more than all of the positive return delivered by the SPDR S&P 500 ETF on Friday.

Jon Keegan

19h

Microsoft, Amazon, and Google all have cumulonimbus-sized cloud backlogs

The top cloud companies can’t keep up with the searing demand for AI computing.

Rani Molla23h

Google uses an AI-generated ad to sell AI search

Google is using AI video to tell consumers about its AI search tools, with a Veo 3-generated advertisement that will begin airing on TV today. In it, a cartoonish turkey uses Google’s AI Mode to plan a vacation from its farm before it’s eaten for Thanksgiving.

Like other AI ad campaigns that have opted to depict yetis or famous artworks rather than humans, Google chose a turkey as its protagonist to avoid the uncanny valley pitfall that happens when AI is used to generate human likenesses.

Google’s in-house marketing group, Google Creative Lab, developed the idea for the ad — not Google’s AI — but chose not to prominently label the ad as AI, telling The Wall Street Journal that consumers don’t actually care how the ad was made.

Exclusive | Google’s First AI Ad Avoids the Uncanny Valley by Casting a Turkey

Rani Molla10/31/25

Amazon, Alphabet, Meta, and Microsoft combined spent nearly $100 billion on capex last quarter

The numbers are in and tech giants Amazon, Alphabet, Meta, and Microsoft spent a whopping $97 billion last quarter on purchases of property and equipment. That’s nearly double what it was a year earlier as AI infrastructure costs continue to balloon and show no sign of stopping. Amazon, which reported earnings and capital expenditure spending that beat analysts’ expectations yesterday, continued to lead the pack, spending more than $35 billion on capex in the quarter that ended in September.

Note that the data we’re using here is from FactSet, which strips out finance leases when calculating capital expenditures. If those expenses were included the total would be well over $100 billion last quarter.

Rani Molla

Slicing the Pie

10/31/25

The best analyst questions from Apple’s earnings call

Apple even answered some of them!