Tech
Robot Among City Ruins
(CSA Images)

How well can top AI models do these jobs?

An OpenAI benchmark tests how well AI models can perform “economically valuable” jobs.

One of the biggest fears fueling the public’s apprehension toward AI is that the technology will eventually take their jobs.

We’ve already seen evidence that some roles like entry-level software development, customer service, and marketing are feeling the effects of automation powered by generative AI. Being able to track the real-world work capabilities of AI models will become increasingly important as models get more and more powerful.

To that end, OpenAI has created a new AI benchmark called “GDPval” that aims to measure just how well leading AI models can do realistic tasks for a variety of “economically valuable” jobs.

OpenAI describes the benchmark as an evolutionary step away from the first wave of benchmarks that followed a more academic, exam-style model:

“[GDPval] measures model performance on tasks drawn directly from the real-world knowledge work of experienced professionals across a wide range of occupations and sectors, providing a clearer picture on how models perform on economically valuable tasks. Evaluating models on realistic occupational tasks helps us understand not just how well they perform in the lab, but how they might support people in the work they do every day.”

Working with experienced industry professionals, the researchers created a dataset of 220 realistic tasks from 44 occupations that someone might do in the course of their work in a particular role.

Here’s an example of one of the tasks in the benchmark’s training data for a real estate broker:

Screenshot 2025-09-26 at 3.41.51 PM
Sample task for a real estate broker from the GDPval benchmark’s training dataset (Huggingfacce.co)

We went through the data and picked a few common jobs from the benchmark’s results. Unsurprisingly, software developers were the most impacted job, with Anthropic’s Claude model getting an average 70% win rate on the test, which was then compared to a human in that role. For example, a score of 50% would put the model on par with a human expert. Audio and video technicians should feel that their job is secure (for now), as the models executed those tasks with very low scores.

OpenAI acknowledges there are limitations with this benchmark. For instance, currently, each task comes with some background materials that are required to do the task — but generating those background materials itself requires complex work and the benchmark doesn’t assess current models’ ability to complete those necessary preparatory tasks. Instead that work is done by the humans testing the AI. The paper also notes that this is a small dataset, and the current jobs tested are mainly those of “knowledge workers” that can be performed on a computer.

Maybe a future version will be used to test how well a robot can scrub your toilet.

More Tech

See all Tech
tech

Anthropic reportedly doubles current fundraising round to $20 billion

Anthropic has doubled its current fundraising round to $20 billion on strong investor demand, according reporting from the Financial Times. The new fundraising round would value the company at a staggering $350 billion. That’s up 91% from September, when it raised at a valuation of $183 billion.

The company reportedly received interest totaling 5x to 6x its original $10 billion fundraising goal, and it’s expected to haul in several billion more than that tally before the current round closes.

Anthropic’s success with enterprise customers and the popularity of its Claude Code product are boosting the company’s momentum as it chases the current valuation leader of the AI startup pack: OpenAI.

The company reportedly received interest totaling 5x to 6x its original $10 billion fundraising goal, and it’s expected to haul in several billion more than that tally before the current round closes.

Anthropic’s success with enterprise customers and the popularity of its Claude Code product are boosting the company’s momentum as it chases the current valuation leader of the AI startup pack: OpenAI.

Produce At Whole Foods Market's Flagship Store

Amazon says it’s doubling down on opening Whole Foods stores. That sounds familiar.

The company says it’ll open 100 Whole Foods locations in the next few years. That sounds similar to plans Whole Foods’ CEO laid out in 2024 for opening 30 stores a year. Since then, it appears to have added 14, total.

Incredulous Man

One year after the DeepSeek freak, the AI industry has adjusted and roared back

A look back at how the Chinese startup shattered conventions, changed the way Big Tech thought about AI, and blew a $1 trillion hole in the stock market that got filled right back up... and then soared to new levels.

tech

Georgia lawmakers introduce data center construction moratorium amid statewide pushback

More and more communities across the US are wrestling with the pros and cons of having a data center come to town. Georgia has become a hotspot of resistance to the data centers planned by Big Tech, according to a new report from The Guardian. The Atlanta metro area led the nation in data center construction in 2024.

Georgia state representatives introduced legislation that would place a one-year moratorium on data center construction in the state. Ten Georgia municipalities have already passed local bans on data centers.

Per the report, at least three other states have seen similar data center moratorium legislation introduced in the last week, including Maryland and Oklahoma.

Georgia state representatives introduced legislation that would place a one-year moratorium on data center construction in the state. Ten Georgia municipalities have already passed local bans on data centers.

Per the report, at least three other states have seen similar data center moratorium legislation introduced in the last week, including Maryland and Oklahoma.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.