Tech
tech

Meta scrambling to defend its AI after Llama 4 benchmark bungle

This weekend, Meta surprised everyone and released two flavors (“Maverick” medium and “Scout” small) of its highly anticipated Llama 4 AI model. Llama 4’s release is a big deal, as the company has been hyping it up as the key to its AI plans in the coming year.

When a major new model drops, people do two things: check to see how the model scored on major benchmarks, and load up the model and kick the tires.

Llama 4’s benchmark scored some eye-popping results for ChatbotArea, a popular human-powered benchmark that’s a sort of blind taste test for AI models with side-by-side results. But after looking at the fine print, some in the community cried foul, as Meta achieved the higher score using an “experimental chat version” of Llama 4 that was not available to the public.

A footnote to a chart that highlighted Llama 4’s standout score read “LMArena testing was conducted using Llama 4 Maverick optimized for conversationality.”

In response to the controversy, LMArena (which runs the Chatbot Arena benchmark) updated its guidelines for testing:

“Meta’s interpretation of our policy did not match what we expect from model providers. Meta should have made it clearer that ‘Llama-4-Maverick-03-26-Experimental’ was a customized model to optimize for human preference. As a result of that we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn’t occur in the future.”

This led to some unfounded accusations that Meta had trained its model on test datasets — akin to giving a kid the answers to a quiz before having them take the test.

To quell the firestorm of questions surrounding the model’s release, Meta’s head of generative AI, Ahmad Al-Dahle, refuted the claims in a post on X yesterday.

The release was also unusual for what was missing from the release: the extra-large version of the model named “Behemoth.” Meta said the model was still being trained, but boasted about its performance nonetheless.

“Llama 4 Behemoth outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks. Llama 4 Behemoth is still training, and we’re excited to share more details about it even while it’s still in flight.”

Meta did not immediately respond to a request for comment.

More Tech

See all Tech
tech

BofA doesn't expect Tesla's ride-share service to have an impact on Uber or Lyft this year

Analysts at Bank of America Global Research compared Tesla’s new Bay Area ride-sharing service with its rivals and found that, for now, it's not much competition for Uber and Lyft. “Tesla scale in SF is still small, and we don't expect impact on Uber/Lyft financial performance in '25,” they wrote.

Tesla is operating an unknown number of cars with drivers using supervised full-self driving in the Bay Area, and roughly 30 autonomous robotaxis in Austin. The company has allowed the public to download its Robotaxi app and join a waitlist but it hasn’t said how many people have been let in off that waitlist.

While the analysts found that Tesla ride shares are cheaper than traditional ride-share services like Uber and Lyft, the wait times are a lot longer (9 minute wait times on average, when cars were available at all) and the process has more friction. They also said the “nature of [a] Tesla FSD ‘driver’ is slightly more aggressive than a Waymo,” the Google-owned company that’s currently operating 800 vehicles in the Bay Area.

APPLE INTELLIGENCE

Apple AI was MIA at iPhone event

A year and a half into a bungled rollout of AI into Apple’s products, Apple Intelligence was barely mentioned at the “Awe Dropping” event.

tech
Jon Keegan
9/10/25

Oracle’s massive sales backlog is thanks to a $300 billion deal with OpenAI, WSJ reports

OpenAI has signed a massive deal to purchase $300 billion worth of cloud computing capacity from Oracle, according to a report from The Wall Street Journal.

The report notes that the five-year deal would be one of the largest cloud computing contracts ever signed, requiring 4.5 gigawatts of capacity.

The news is prompting shares to pare some of their massive gains, presumably because of concerns about counterparty and concentration risk.

Yesterday, Oracle shares skyrocketed as much as 30% in after-hours trading after the company forecast that it expects its cloud infrastructure business to see revenues climb to $144 billion by 2030.

Oracle shares were up as much as 43% on Wednesday.

It’s the second example in under a week of how much OpenAI’s cash burn and fundraising efforts are playing a starring role in the AI boom: the Financial Times reported that OpenAI is also the major new Broadcom customer that has placed $10 billion in orders.

Yesterday, Oracle shares skyrocketed as much as 30% in after-hours trading after the company forecast that it expects its cloud infrastructure business to see revenues climb to $144 billion by 2030.

Oracle shares were up as much as 43% on Wednesday.

It’s the second example in under a week of how much OpenAI’s cash burn and fundraising efforts are playing a starring role in the AI boom: the Financial Times reported that OpenAI is also the major new Broadcom customer that has placed $10 billion in orders.

Large companies have started to drop AI from their businesses

Census data shows drop in large companies using AI

AI appears to be everywhere, but that doesn’t mean big companies have fully embraced the use of the technology in their day-to-day business.

tech

Report: Microsoft adds Anthropic alongside OpenAI in Office 365, citing better performance

In a move that could test its fraught $13 billion partnership, Microsoft is moving away from relying solely on OpenAI to power its AI features in Office 365 and will now also include Anthropic’s Claude Sonnet 4 model, according to a report from The Information.

The move is a tectonic shift that boosts Anthropic’s standing, heightens risks for OpenAI, and has huge ramifications for the balance of power in the fast-moving AI field.

Per the report, Microsoft executives found that Anthropic’s AI outperformed OpenAI’s on tasks involving spreadsheets and generating PowerPoint slide decks, both crucial parts of Microsoft’s Office 365 productivity suite.

Microsoft will have to pay the competition to provide the services —Amazon Web Services currently hosts Anthropic’s models while Microsoft’s Azure cloud service does not, The Information reported.

OpenAI is also reportedly working on its own productivity suite of apps.

The move is a tectonic shift that boosts Anthropic’s standing, heightens risks for OpenAI, and has huge ramifications for the balance of power in the fast-moving AI field.

Per the report, Microsoft executives found that Anthropic’s AI outperformed OpenAI’s on tasks involving spreadsheets and generating PowerPoint slide decks, both crucial parts of Microsoft’s Office 365 productivity suite.

Microsoft will have to pay the competition to provide the services —Amazon Web Services currently hosts Anthropic’s models while Microsoft’s Azure cloud service does not, The Information reported.

OpenAI is also reportedly working on its own productivity suite of apps.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.