Tech
tech
Jon Keegan

Quit the yapping: New AI technique could cut costs 90% by saying less

A consensus is emerging in AI circles that the way forward involves models that use “chain of reasoning” to get better performance, at the expense of costlier computing resources. This process involves instructing the model to break a problem down into detailed step-by-step instructions. The problem is that these steps can be pretty verbose, and when it comes to AI, more words = more cost.

A new paper from researchers at Zoom shows that using a new technique dubbed “chain of draft,” if you tell a model to simply limit those steps to succinct “drafts” of only five words or so, rather than wordy sentences, not only can you still achieve high performance on responses, but you can cut computing costs by up to 90%.

AI models are priced by the number of “tokens” — or portions of words — that are input and output by the model. For example: OpenAI’s o3-mini “reasoning” model costs $1.10 per million tokens input, and $4.40 per million tokens of output. That may seem cheap, but when you’re processing millions of queries, this can really add up.

“By reducing verbosity and focusing on critical insights, CoD matches or surpasses CoT in accuracy while using as little as only 7.6% of the tokens, significantly reducing cost and latency across various reasoning tasks,” the paper reports.

Translation: it’s faster, cheaper, and sometimes better than chain of thought.

This approach is also notable for its ease of use. You can simply change the prompts you enter to get this benefit. That said, most of the gains were found using larger models like OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, while using smaller models resulted in poorer performance.

Go deeper: Here are OpenAI’s 50 Laws of Robotics

A new paper from researchers at Zoom shows that using a new technique dubbed “chain of draft,” if you tell a model to simply limit those steps to succinct “drafts” of only five words or so, rather than wordy sentences, not only can you still achieve high performance on responses, but you can cut computing costs by up to 90%.

AI models are priced by the number of “tokens” — or portions of words — that are input and output by the model. For example: OpenAI’s o3-mini “reasoning” model costs $1.10 per million tokens input, and $4.40 per million tokens of output. That may seem cheap, but when you’re processing millions of queries, this can really add up.

“By reducing verbosity and focusing on critical insights, CoD matches or surpasses CoT in accuracy while using as little as only 7.6% of the tokens, significantly reducing cost and latency across various reasoning tasks,” the paper reports.

Translation: it’s faster, cheaper, and sometimes better than chain of thought.

This approach is also notable for its ease of use. You can simply change the prompts you enter to get this benefit. That said, most of the gains were found using larger models like OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, while using smaller models resulted in poorer performance.

Go deeper: Here are OpenAI’s 50 Laws of Robotics

More Tech

See all Tech
tech

Driverless Waymo struck a child near school in California

A Google Waymo struck a child near a Santa Monica elementary school during morning drop-off last week, as self-driving cars by Waymo, Tesla, and others continue their expansion across the country. In a blog post, Waymo said the fully driverless car detected the child as they emerged from behind a parked SUV, braked sharply, and reduced speed from approximately 17 mph to under 6 mph before striking the child. The child suffered minor injuries and walked away.

The company reported the incident to the National Highway Traffic Safety Administration, which is currently investigating, adding fresh scrutiny to how robotaxis perform in the wild.

The company reported the incident to the National Highway Traffic Safety Administration, which is currently investigating, adding fresh scrutiny to how robotaxis perform in the wild.

tech

Digging into Microsoft’s cloud backlog

Microsoft’s Azure cloud computing unit is seeing huge demand. In yesterday’s second-quarter earnings call, Microsoft CFO Amy Hood said the company’s commercial bookings increased 230% thanks to large commitments from OpenAI and Anthropic and healthy demand for its Azure cloud computing platform.

Hood said that the company’s “remaining performance obligations” (RPO) ballooned to a staggering $625 billion, up 110% from the same period last year. How long will it take for Microsoft to fulfill these booked services? Hood said the weighted average duration was “approximately two and a half years,” but a quarter of that will be recognized in revenue in the next 12 months.

Shares of Microsoft tanked today, down over 11%, despite the strong beat on revenue and earnings. Investors may be concerned that while huge, that extra demand was coming only from OpenAI, an issue that Oracle recently experienced.

But Hood said the non-OpenAI RPO still grew 28% year on year, which reflects “ongoing broad customer demand across the portfolio.”

US-ART-BASEL

Meta and Tesla are funding the future with their core businesses — but only one of them is still growing

The two tech giants, on back-to-back earnings calls, made it sound like they’re selling the same AI-powered future. But the picture of the underlying businesses, and how they’re using AI to furnish current sales, couldn’t be more different.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.