Tech
Boxing Robots
(Getty Images)

Big Tech's onslaught of bots forces publishers to play an impossible game of Whac-A-Mole

In the battle to protect their valuable content from content-thirsty AI scraping bots, publishers have to rely upon a single text file for their defense. 

For thirty years, the humble "robots.txt" file has been used by website owners to alert automated scrapers what content they will allow to be indexed, and what they want to keep out of search engines.

But since tech companies have been racing to ingest as much content as possible to train their AI models, the robots.txt file is also the only place content publishers can use to refuse being scraped and potentially used for AI training — if they know exactly what scrapers to block. Scrapers identify themselves using names like Google’s "googlebot,” Meta's "facebookbot,” or OpenAI's "gptbot,” which appear in the web page request's "user agent" description. 

Publishers must now increasingly play a game of Whac-A-Mole to include new scrapers (like Meta recently let loose) in their robots.txt files to block the new bots as they pop up. Once a site has been scraped for AI training without permission, content owners have little recourse, other than the courts.

Data journalist Ben Welsh's homepages.news project collects automated snapshots of top news websites, as well as the contents of their robots.txt files. In a recent sample of Welsh's data from Aug. 16-17, about 40% of top news sites blocked all scrapers. The most blocked scraper was OpenAI's "gptbot,” with about 24% of the news sites blocking it. Meta's new "Meta-ExternalAgent" bot, which appeared in July was only blocked by around 17% of sites.

Earlier this year, Reuters Institute published a report that found by the end of 2023, 79% of US-based news websites were blocking OpenAI's bot.

The entire mechanism of the robots.txt file is voluntary, and many companies have been caught ignoring them altogether. If a company decides to change the name of their bot, or release a new one without their name in the text, publishers may not know to block it. 

While the AI5 and AI6 will be made by TSMC and Samsung, respectively, Musk has said Tesla eventually aims to manufacture its future AI chips at Tesla’s upcoming Terafab factory in Austin.

tech

NHTSA expands Tesla FSD probe, focusing on whether system can detect when cameras can’t see the road

The National Highway Traffic Safety Administration said it is expanding its probe into Tesla’s Full Self-Driving system into an engineering analysis covering about 3.2 million Teslas, a majority of its vehicles that are on the road in the US, Reuters reports.

The agency is focusing on Tesla’s “degradation detection system,” which is meant to recognize when its camera-based technology cannot reliably perceive the road and prompt drivers to intervene:

“Available incident data raise concerns that Tesla’s degradation detection system, both as originally deployed and later updated, fails to detect and/or warn the driver appropriately under degraded visibility conditions such as glare and airborne obscurants. In the crashes that ODI has reviewed, the system did not detect common roadway conditions that impaired camera visibility and/or provide alerts when camera performance had deteriorated until immediately before the crash occurred.”

Tesla CEO Elon Musk has long argued that the company’s self-driving approach does not require the expensive lidar sensors used by rivals such as Waymo.

The agency is focusing on Tesla’s “degradation detection system,” which is meant to recognize when its camera-based technology cannot reliably perceive the road and prompt drivers to intervene:

“Available incident data raise concerns that Tesla’s degradation detection system, both as originally deployed and later updated, fails to detect and/or warn the driver appropriately under degraded visibility conditions such as glare and airborne obscurants. In the crashes that ODI has reviewed, the system did not detect common roadway conditions that impaired camera visibility and/or provide alerts when camera performance had deteriorated until immediately before the crash occurred.”

Tesla CEO Elon Musk has long argued that the company’s self-driving approach does not require the expensive lidar sensors used by rivals such as Waymo.

$1B

Apple is behind the rest of Big Tech when it comes to developing its own AI, but that hasn’t stopped it from cashing in on the AI boom. The iPhone maker stands to bring in more than $1 billion in App Store fees this year from other companies’ generative-AI apps, mostly from ChatGPT, The Wall Street Journal reports, citing data from App Magic.

Unlike rivals pouring hundreds of billions into AI infrastructure, Apple’s spending has been relatively modest, with its overall capital expenditure actually declining last quarter. Its lucrative App Store model lets Apple profit from AI as a gatekeeper without fully joining the expensive race to build it.

Multicolor Sticks

OpenAI is shipping everything. Anthropic is perfecting one thing.

The two AI titans are in a race to grow revenues, but they have very different strategies for releasing products. And one approach appears to be winning out.

73%

Here’s another sign Anthropic’s enterprise tools are killing it: the AI firm now captures 73% of all spending among companies buying AI tools for the first time, Axios reports, citing data from Ramp, a fintech company that provides corporate cards and expense management software. That’s up from 50% in January, when it was tied with OpenAI.

As we’ve noted, Big Tech is pivoting from experimentation to revenue — and enterprise is where that shift is playing out.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, Robinhood Derivatives, LLC, or Robinhood Money, LLC. Futures and event contracts are offered through Robinhood Derivatives, LLC.