Tech
Full frame Close up Background Blueberries
(Sunlight7/Getty Images)

GPT-5: “A legitimate PhD level expert in anything” that sucks at spelling and geography

OpenAI spent a lot of time talking about how “smart” GPT-5 is, yet it failed spectacularly at tasks that a second grader could achieve.

Jon Keegan

Yesterday, OpenAI spent a lot of time talking about how “smart” its new GPT-5 model was.

OpenAI cofounder and CEO Sam Altman said of GPT-5: “Its like talking to an expert, a legitimate PhD level expert in anything,” and called it “the most powerful, the most smart, the fastest, the most reliable and the most robust reasoning model that weve shipped to date.”

Demos showed GPT-5 effortlessly creating an interactive simulation to explain the Bernoulli effect, diagnosing and fixing complex code errors, planning personalized running schedules, and “vibecoding” a cartoony 3D castle game. The company touted benchmarks showing how GPT-5 aced questions from a Harvard-MIT mathematics tournament and got high scores on coding, visual problem-solving, and other impressive tasks.

But once the public got its chance to kick GPT-5’s tires, some cracks started to emerge in this image of a superintelligent expert in everything.

AI models are famously bad at spelling different kinds of berries, and GPT-5 is no exception.

I had to try the “blueberry” thing myself with GPT5. I merely report the results.

[image or embed]

— Kieran Healy (@kjhealy.co) August 7, 2025 at 8:04 PM

Another classic failure of being bad at maps persisted with GPT-5 when it was asked to list all the states with the letter R in their names. It even offered to map them, with hilarious results:

My goto is to ask LLMs how many states have R in their name. They always fail. GPT 5 included Indiana, Illinois, and Texas in its list. It then asked me if I wanted an alphabetical highlighted map. Sure, why not.

[image or embed]

— radams (@radamssmash.bsky.social) August 7, 2025 at 8:40 PM

To be fair to OpenAI, this problem isn’t unique to GPT-5, as similar failures were documented with Google’s Gemini.

During the livestream, in an embarrassing screwup, a presentation showed some charts that looked like they had some of the same problems.

If this was all described as a “technical preview,” these kinds of mistakes might be understandable. But this is a real product from a company thats pulling in $1 billion per month. OpenAI’s models are being sold to schools, hospitals, law firms, and government agencies, including for national security and defense.

OpenAI is also telling users that GPT-5 can be used for medical information, while cautioning that the model does not replace a medical professional.

“GPT‑5 is our best model yet for health-related questions, empowering users to be informed about and advocate for their health.”

Why is this so hard?

The reason why such an advanced model can appear to be so capable at complex coding, math, and physics yet fail so spectacularly at spelling and maps is that generative models like GPT-5 are probabilistic systems at their core — they predict the most likely next token based on the volumes of data they have been trained on. They don’t know anything, and they don’t think or have any model of how the world works (though researchers are working on that).

When the model writes its response, you see the thing that has the highest score for what you should see next. But with math and coding, the rules are more strict and the examples its been trained on are consistent, so it has higher accuracy and can ace the math benchmarks with flying colors.

But drawing a map with names or counting the letters in a word are weirdly tough, as it requires a skill the model doesn’t really have and has to figure out step by step from patterns, which can lead to odd results. That’s a simplification of a very complex and sophisticated system, but applies to a lot of the generative-AI technology in use today.

Thats also a whole lot to explain to users, but OpenAI boils those complicated ideas down to a single warning below the text box: “ChatGPT can make mistakes. Check important info.”

OpenAI did not immediately respond to a request for comment.

More Tech

See all Tech
tech

After Tesla earnings, prediction markets think unsupervised FSD is less likely than ever to be rolled out this year

Tesla’s unsupervised full self-driving technology, which would autonomously ferry passengers around without a human driver having to pay attention, is supposed to help catapult the electric vehicle company’s valuation further into the stratosphere. It was also supposed to be available this year, but prediction markets participants, as well as former Tesla self-driving leaders, no longer think that will happen.

On Teslas earnings call this week, CEO Elon Musk said the company now had “clarity” on achieving unsupervised full self-driving — something he’s repeatedly said would be available at least in some markets this year.

The comments seemed to give Polymarket prediction markets participants some clarity. There, the market-implied probability that Tesla will release unsupervised FSD this year reached its lowest point since the event contract was opened in May.

The odds of it happening had been pretty high up until late June, when Tesla’s long-awaited robotaxi launched with a safety driver in the passenger seat. The unsupervised FSD event contract specifies the feature can have “no requirement for human intervention.”

tech
Rani Molla

Banks prepare record $38 billion debt financing to fund Oracle-tied data centers

Banks led by JPMorgan and Mitsubishi UFJ are preparing a $38 billion debt offering to fund two Oracle-tied data centers in Texas and Wisconsin, Bloomberg reports. The projects, developed by Vantage Data Centers, will support Oracle’s $500 billion Stargate AI infrastructure push with OpenAI and Nvidia.

The loans — $23.25 billion for Texas and $14.75 billion for Wisconsin — are expected to mature in four years, price about 2.5 percentage points higher than the benchmark rate, and mark the largest AI infrastructure financing to date.

Oracle executives recently said that the company anticipates cloud gross margins will reach 35% and that it expects to see $166 billion in cloud infrastructure revenue by FY 2030.

Oracle is up 1.5% premarket.

The loans — $23.25 billion for Texas and $14.75 billion for Wisconsin — are expected to mature in four years, price about 2.5 percentage points higher than the benchmark rate, and mark the largest AI infrastructure financing to date.

Oracle executives recently said that the company anticipates cloud gross margins will reach 35% and that it expects to see $166 billion in cloud infrastructure revenue by FY 2030.

Oracle is up 1.5% premarket.

tech
Rani Molla

Google rises on official announcement of Anthropic deal worth “tens of billions”

Google has made its deal to expand AI compute to Anthropic, reported earlier this week by Bloomberg, official. In order to train and serve its Claude model, Anthropic has agreed to pay Google Cloud “tens of billions of dollars” to access up to 1 million tensor processing units, or TPUs, as well as other cloud services.

Google, of course, has a 14% stake in Anthropic, making this one of the many circular AI deals happening at the moment.

“Anthropic and Google have a longstanding partnership and this latest expansion will help us continue to grow the compute we need to define the frontier of AI,” Anthropic CFO Krishna Rao said in the press release. “Our customers — from Fortune 500 companies to AI-native startups — depend on Claude for their most important work, and this expanded capacity ensures we can meet our exponentially growing demand while keeping our models at the cutting edge of the industry.”

The announcement has sent Google up again, more than 1% premarket.

tech
Rani Molla

Report: Snap seeking $1 billion to finance its AR glasses division in “existential” fundraise

Snap is down more than 1% this morning following news that the company is attempting to raise $1 billion for its AR glasses unit in what someone told Sources.news was an “existential” fundraise.

A Snap spokesperson countered, “We do not need to raise money to execute against our plans to publicly launch Specs in 2026, but remain open to opportunities that could accelerate our growth.”

Multiple investors are involved in the talks, including Saudi Arabia’s Public Investment Fund, according to Sources.news. The report also noted that Snap plans to turn the unit that makes its Specs glasses into an independent subsidiary à la Google’s Waymo “that can continue raising capital from investors.”

Snap plans to produce about 100,000 units of next year’s Specs, pricing them around $2,500.

The beleaguered stock saw quite a bit of retail interest last month, amid r/WallStreetBets chatter that its low nominal price made it a potential acquisition target.

Multiple investors are involved in the talks, including Saudi Arabia’s Public Investment Fund, according to Sources.news. The report also noted that Snap plans to turn the unit that makes its Specs glasses into an independent subsidiary à la Google’s Waymo “that can continue raising capital from investors.”

Snap plans to produce about 100,000 units of next year’s Specs, pricing them around $2,500.

The beleaguered stock saw quite a bit of retail interest last month, amid r/WallStreetBets chatter that its low nominal price made it a potential acquisition target.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.