Tech
tech

Meta scrambling to defend its AI after Llama 4 benchmark bungle

This weekend, Meta surprised everyone and released two flavors (“Maverick” medium and “Scout” small) of its highly anticipated Llama 4 AI model. Llama 4’s release is a big deal, as the company has been hyping it up as the key to its AI plans in the coming year.

When a major new model drops, people do two things: check to see how the model scored on major benchmarks, and load up the model and kick the tires.

Llama 4’s benchmark scored some eye-popping results for ChatbotArea, a popular human-powered benchmark that’s a sort of blind taste test for AI models with side-by-side results. But after looking at the fine print, some in the community cried foul, as Meta achieved the higher score using an “experimental chat version” of Llama 4 that was not available to the public.

A footnote to a chart that highlighted Llama 4’s standout score read “LMArena testing was conducted using Llama 4 Maverick optimized for conversationality.”

In response to the controversy, LMArena (which runs the Chatbot Arena benchmark) updated its guidelines for testing:

“Meta’s interpretation of our policy did not match what we expect from model providers. Meta should have made it clearer that ‘Llama-4-Maverick-03-26-Experimental’ was a customized model to optimize for human preference. As a result of that we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn’t occur in the future.”

This led to some unfounded accusations that Meta had trained its model on test datasets — akin to giving a kid the answers to a quiz before having them take the test.

To quell the firestorm of questions surrounding the model’s release, Meta’s head of generative AI, Ahmad Al-Dahle, refuted the claims in a post on X yesterday.

The release was also unusual for what was missing from the release: the extra-large version of the model named “Behemoth.” Meta said the model was still being trained, but boasted about its performance nonetheless.

“Llama 4 Behemoth outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks. Llama 4 Behemoth is still training, and we’re excited to share more details about it even while it’s still in flight.”

Meta did not immediately respond to a request for comment.

More Tech

See all Tech
tech

Amazon raises the price for ad-free Prime Video to $4.99

Amazon is giving consumers more — for more. The e-commerce giant is raising the price of its ad-free Prime Video tier to $4.99 a month, up from $2.99.

On April 10, the service, now rebranded as Prime Video Ultra, will allow more concurrent streams (five instead of three) and up to 100 downloads, up from 25. Ad-free Prime Video had been included with a Prime membership until 2024, when Amazon added ads and began charging $2.99 a month to remove them.

For what it’s worth, ad-free Prime Video is still cheaper than the other increasingly expensive streaming services — if you don’t include the cost of Prime.

For what it’s worth, ad-free Prime Video is still cheaper than the other increasingly expensive streaming services — if you don’t include the cost of Prime.

tech

Uber relaunches robotaxi service with Hyundai-backed Motional in Las Vegas

What happens in Vegas, keeps happening in Vegas.

Uber users in Las Vegas can now be matched with an electric Motional IONIQ 5 robotaxi along parts of the Strip and at select casinos, resorts, and the Town Square shopping district near the airport, the companies said. For now, each vehicle includes a human safety operator monitoring from behind the wheel, who the companies say will be removed by year’s end.

Uber and Hyundai-backed autonomous tech company Motional previously tested a service there in 2022. “Motional is ready to put our extensive ride hail experience to work with Uber again,” said David Carroll, vice president of commercialization at Motional, which paused its commercial deployments in 2024 to refocus on its core driverless technology after scaling back operations.

This time around, the companies will be joining a much more crowded field. Amazon-owned Zoox has been offering free rides along select destinations on the Strip since last year, and both Tesla’s Robotaxi and Alphabet-owned Waymo have plans to open up shop there in the near future.

Thanks to a spate of recent AV partnerships, Uber, which sold its own autonomous unit back in 2020, is finding itself at the center of the nascent robotaxi boom.

tech

Musk says “xAI was not built right” amid executive departures, Cursor hires

There’s been a lot of turnover lately at xAI, with numerous executive departures and, yesterday, news that the SpaceX-owned company was hiring two senior leaders from Cursor, an AI coding startup that’s raising funds at a $50 billion valuation.

The reason? “xAI was not built right first time around, so is being rebuilt from the foundations up,” CEO Elon Musk posted on xAI-owned X yesterday, in response to a post about the Cursor hires. Earlier this month, Musk told a conference audience, “Grok is currently behind on coding.”

The news amounts to an admission of a reset inside xAI and an acknowledgment that the company is trailing AI peers like Anthropic and OpenAI in one of AI’s most commercially important applications: coding.

tech

War in the Middle East halts Meta’s undersea fiber project

Meta’s massive undersea cable project connecting Africa and the Middle East to Europe has run into an unexpected obstacle — not under the sea, but in the sky and land above: the war in the Middle East.

According to a report from Bloomberg, France’s Alcatel Submarine Networks, the company that is laying the cable, notified customers that it can no longer safely operate in the area.

The 2Africa project consists of a 45,000-kilometer chain of undersea fiber-optic cables that encircles Africa and runs through the Red Sea, up through the Gulf of Oman, where the Strait of Hormuz sits. Iran has declared the strait — a crucial choke point for oil and natural gas tankers — closed for traffic.

Meta is building the network in partnership with Bayobab, China Mobile, Orange, Telecom Egypt, Vodafone, WIOCC, and Center3.

The 2Africa project consists of a 45,000-kilometer chain of undersea fiber-optic cables that encircles Africa and runs through the Red Sea, up through the Gulf of Oman, where the Strait of Hormuz sits. Iran has declared the strait — a crucial choke point for oil and natural gas tankers — closed for traffic.

Meta is building the network in partnership with Bayobab, China Mobile, Orange, Telecom Egypt, Vodafone, WIOCC, and Center3.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, Robinhood Derivatives, LLC, or Robinhood Money, LLC. Futures and event contracts are offered through Robinhood Derivatives, LLC.