Tech
Robot buying a drink from vending machine
(Getty Images)
VEND FOR YOURSELF

Gemini 3 is insanely good at visual reasoning... and running a vending machine

Google’s stock is up maybe because Gemini 3 is good and its powered mostly by Google’s TPUs — or, maybe, because Alphabet’s about to launch a vending machine business.

David Crowther

How do you measure what an AI model can do?

You ask it to spell strawberry, make a video of Will Smith eating spaghetti, or do some basic math.

But, once you’ve exhausted all of the obvious tests, you might want something a little more formal — and it’s a question that researchers have been grappling with for years.

Now, there are a whole swath of benchmark tests that new AI models are put through, by both independent — and not so independent — organizations, in an increasingly weird kind of robot arena. Some of the tests are quizzes. Some require verbal, visual, or inductive reasoning. Many ask the large language models to do a lot of math that I cannot do. But one in particular asks a different question:

How much money can this thing make running a vending machine?

Vending-Bench 2, a test created by Andon Labs, puts LLMs through their paces by making them run “a simulated vending machine business over a year,” scoring them not on how many questions they got right out of 100, but how much cash was left in their virtual piggy banks at the end of the year.

This, it turns out, is hard for LLMs, which are prone to going off on tangents, losing focus, and are generally just quite poor at optimizing for long-term outcomes. That makes sense when you consider that the core of many of the AI models we use every day is, “What’s the most likely bit of text/pixel/image to come after this bit of text/pixel/image?”

Per Andon Labs, in the Vending-Bench 2 test:

“Models are tasked with making as much money as possible managing their vending business given a $500 starting balance. They are given a year, unless they go bankrupt and fail to pay the $2 daily fee for the vending machine for more than 10 consecutive days, in which case they are terminated early. Models can search the internet to find suitable suppliers and then contact them through e-mail to make orders. Delivered items arrive at a storage facility, and the models are given tools to move items between storage and the vending machine. Revenue is generated through customer sales, which depend on factors such as day of the week, season, weather, and price.”

Running the model for “a year” results in as many as 6,000 messages in total, and a model “averages 60-100 million tokens in output during a run,” according to Andon.

In the simulation, the AI model has to negotiate with suppliers as well as deal with costly refunds, delayed deliveries, bad weather, and price scammers.

Google’s Gemini 3 Pro, it turns out, is the best of any model tested yet — ending the year with $5,478 in its account, considerably more than Claude’s Sonnet 4.5, Grok 4, and GPT-5.1. That’s thanks to its relentless negotiating skills. Per Andon, “Gemini 3 Pro consistently knows what to expect from a wholesale supplier and keeps negotiating or searching for new suppliers until it finds a reasonable offer.”

Gemini 3 Vending Machine benchmark
Andon Labs / Vending-Bench 2

OpenAI’s model is, apparently, too trusting. Andon Labs hypothesizes that its relatively weak performance “comes down to GPT-5.1 having too much trust in its environment and its suppliers. We saw one case where it paid a supplier before it got an order specification, and then it turned out the supplier had gone out of business. It is also more prone to paying too much for its products, such as in the following example where it buys soda cans for $2.40 and energy drinks for $6.” Anyone who’s had ChatGPT sycophantically tell them they’re a genius for uttering even the most half-baked idea might understand how this can happen.

For what it’s worth, the $5,000 and change that Gemini averaged over its runs is considered pretty poor relative to what a smart human might be able to do, with Andon Labs estimating that a “good” strategy could make roughly $63,000 in a year.

What do you bench?

Diet Coke negotiations aside, Gemini’s scores on more traditional AI benchmarks were also impressive — at least, according to Google. A table posted on the company’s blog shows that Gemini 3 Pro tops or matches its peers in all but one of the benchmarks.

Gemini 3 benchmarks
Google / Alphabet

Its scores on visual reasoning tests — such as the ARC-AGI-2 test, where it scored 31.1%, way ahead of Anthropic’s and OpenAI’s best efforts — are particularly impressive. On ScreenSpot-Pro, a test that basically asks models to locate certain buttons or icons from a screenshot, Gemini 3 is leaps and bounds ahead of its rivals, scoring 72.7%. (GPT-5.1 scored just 3.5%.)

With Alphabet’s full tech stack responsible for the Gemini models, investor reaction to the release has been very positive so far, building on a wave of good news for the search giant this week. As my colleague Rani Molla wrote:

“[Gemini’s] performance is crucial to Google’s future success as the company embeds its AI models across its products and relies on them to generate new revenue from existing lines — particularly by driving growth in Cloud and reinforcing its ad and search dominance.”

Go Deeper: Check out Vending-Bench 2.

More Tech

See all Tech
tech

Google’s AI chip business could be a $900 billion boon for the company

Google may be sitting on a massive new business that it has yet to fully exploit.

Google’s custom tensor processing unit (TPU) AI chips have been getting a lot of attention recently, making the tech world wonder if there are other ways to power its AI dreams rather than just by using Nvidia’s GPUs.

Bloomberg spoke with analysts who estimate that, if it does decide to sell its chips to others, Google could capture 20% of the AI market, making it a $900 billion business. For comparison, Google Cloud pulled in $43.2 billion of revenue last year.

Even if Google just sticks with renting access to its TPUs, it will continue to drive down costs and increase margins as it ekes out performance improvements, such as the 30x improvement in power efficiency that the latest generation of TPUs has delivered for the company.

Bloomberg spoke with analysts who estimate that, if it does decide to sell its chips to others, Google could capture 20% of the AI market, making it a $900 billion business. For comparison, Google Cloud pulled in $43.2 billion of revenue last year.

Even if Google just sticks with renting access to its TPUs, it will continue to drive down costs and increase margins as it ekes out performance improvements, such as the 30x improvement in power efficiency that the latest generation of TPUs has delivered for the company.

tech

OpenAI’s Sam Altman has explored bringing his feud with Tesla’s Elon Musk to space

Billionaires, they’re just like us: they want to bring their terrestrial beefs to outer space.

OpenAI CEO Sam Altman has explored buying or partnering with a rocket company to compete with Tesla CEO Elon Musk’s SpaceX, The Wall Street Journal reports. The two billionaires have had numerous public feuds over the years that have played out in the courts and on social media. They also both lead AI companies that have insatiable needs for data centers and have publicly discussed building data centers in space.

Altman seems like he thinks this could be more than science fiction. He reportedly reached out to rocket maker Stoke Space to potentially make equity investments in the company to get a controlling stake, though the talks are no longer active, WSJ reports.

Or perhaps he just wanted a Sherwood bobblehead of himself.

tech

Report: Meta to slash metaverse, VR spending by up to 30%

Four years after changing its name to reflect its focus on the loosely defined “metaverse,” Meta is planning deep cuts to the company’s money-losing virtual reality efforts, according to a report from Bloomberg.

Meta’s Reality Labs division, home to the teams working on metaverse products — which include Quest VR headsets, Horizon Worlds, and its Ray-Ban Meta glasses — has lost about $70 billion since the company started breaking out the unit in 2020.

The company has struggled to get consumers to buy into CEO Mark Zuckerberg’s vision of working and playing in virtual reality worlds, like the company’s Horizon Worlds platform.

Investors seem to love the news of the pivot, as shares shot up as much as 5% in early trading.

Meta’s recent hiring spree of AI superstars from competitors for its Meta Superintelligence Labs shows that the company’s attention is now all in on AI.

Meta’s Reality Labs division, home to the teams working on metaverse products — which include Quest VR headsets, Horizon Worlds, and its Ray-Ban Meta glasses — has lost about $70 billion since the company started breaking out the unit in 2020.

The company has struggled to get consumers to buy into CEO Mark Zuckerberg’s vision of working and playing in virtual reality worlds, like the company’s Horizon Worlds platform.

Investors seem to love the news of the pivot, as shares shot up as much as 5% in early trading.

Meta’s recent hiring spree of AI superstars from competitors for its Meta Superintelligence Labs shows that the company’s attention is now all in on AI.

Salesforce CEO Marc Benioff Kicks Off Dreamforce With Keynote Presentation

The best quotes from Salesforce’s earnings call

CEO Marc Benioff doesn’t disappoint.

tech

Salesforce jumps as Q3 earnings top expectations

Salesforce jumped after-hours Wednesday as it posted earnings and guidance that beat analysts’ expectations. Its adjusted earnings per share came in at $3.25 for the third quarter of fiscal 2026, above the FactSet analyst consensus estimate of $2.86. Its revenue rose 9% to $10.3 billion, in line with expectations.

The software-as-a-service company issued fourth-quarter revenue guidance of $11.13 billion to $11.23 billion, well above the $10.9 billion analysts had predicted. It also forecast adjusted earnings of $3.02 to $3.04 per share, compared with analysts’ expectations of $3.04.

Shares were up 4.3% in recent trading.

“Our Agentforce and Data 360 products are the momentum drivers,” CEO Marc Benioff said in the press release.

Last quarter, Salesforce shares fell after the company issued disappointing third-quarter guidance. Coming into today’s report, the stock was down about 30% year to date.

Investors will be watching the earnings call closely for updates on the company’s AI strategy — particularly progress on Agentforce and broader adoption of its AI-driven cloud offerings.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.