Power
Robot Reading a Book
(Getty Images)

Judge rules Anthropic training on books it purchased was “fair use,” but not for the ones it stole

Anthropic still faces litigation for training its models on millions of pirated texts.

When AI companies like OpenAI, Anthropic, and Meta were racing to build and train new large language models, they scrambled to find enough text to train their systems on. Countless web pages, photos, YouTube videos, Disney movies, Reddit threads, and book texts were slurped up to feed the models to add billions and billions of tokens.

Resulting litigation initiated by copyright holders has shown that the legality of the process was on the minds of some AI company employees, like researchers at Meta who raised concerns while training its Llama model, only to be told that the use of LibGen, a corpus of pirated texts, was approved by “MZ.”

But yesterday, a court decided a case partially in favor of AI companies, with far-reaching consequences for all the companies that were sucking copyrighted material into their models.

A federal judge in the Northern District of California has ruled that Anthropic was not violating the copyright of authors of the books it purchased and scanned for training.

A group of authors filed the suit against Anthropic last August, alleging that Anthropic had acknowledged training its Claude AI model using “The Pile,” a mass of text shared online that contained millions of copyrighted works, including some written by the plaintiffs.

The process of buying, scanning, and ingesting the text for use in training the Claude model was determined to be “exceedingly transformative and was a fair use under Section 107 of the Copyright Act” by Judge William Alsup, a key test of the fair use doctrine in intellectual property law.

But what about the “over seven million copies of books” that Anthropic admitted were pirated that it did not pay for? The judge said that was not fair use, and warrants its own trial.

Judge Alsup wrote:

“The downloaded pirated copies used to build a central library were not justified by a fair use. Every factor points against fair use. Anthropic employees said copies of works (pirated ones, too) would be retained ‘forever’ for ‘general purpose’ even after Anthropic determined they would never be used for training LLMs. A separate justification was required for each use. None is even offered here except for Anthropic’s pocketbook and convenience.”

The case is the first of its kind to be decided in the US, and lays out a potentially legal way for AI companies to safely train their models using copyrighted works — as long as they purchase them. That said, there are still many other cases pending and many factors at play before the industry has clear rules.

But companies that are caught knowingly using pirated, copyrighted works to train AI models may face new legal exposure.

An Anthropic spokesperson told Sherwood News:

“We are pleased that the Court recognized that using ‘works to train LLMs was transformative — spectacularly so.’ Consistent with copyright’s purpose in enabling creativity and fostering scientific progress, ‘Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different.’”

More Power

See all Power
power

OK, so when was the longest shutdown in US history?

The US government officially shut down at 12:01 a.m. on Wednesday after senators failed to agree on a last-minute funding bill. Though initially shrugging off the threat of a shutdown during yesterday’s session, stocks were mildly in the red on Wednesday as investors reacted to what is now the 11th shutdown in the government’s history.

Until this latest shutdown, there had been 20 government funding gaps experienced since 1976 — though not all ended in a full shutdown, with full closure averted in half of those cases.

Indeed, prior to the 1980s, funding gaps didn’t typically have major effects on government operations, with agencies continuing to operate on the basis that the funding would come eventually. However, a more stringent interpretation of the rules led to a stricter appropriations process from the early 1980s onward, with many subsequent funding gaps resulting in a shutdown of affected agencies (unless the gaps were quickly fixed or occurred over a weekend).

Obviously, the duration of the latest shutdown is still unclear, but it will continue until Congress passes a funding bill — most likely via a “continuing resolution,” which has ended every shutdown since 1990. Data analyzed by USAFacts suggest that it might not be a one- or two-day affair, as funding gaps have lengthened in recent years.

Government shutdown patterns
Sherwood News

Indeed, the last shutdown, which began in December 2018, ended up becoming the longest in history, at a whopping 34 days. By the time the government reopened in January 2019, about $3 billion (in 2019 dollars) had been wiped from the GDP in Q4, per data from the Congressional Budget Office, with approximately $18 billion in “federal discretionary spending” delayed over the roughly five-week stretch.

power

GM climbs following upgrade, report that Trump administration seeks stake in its lithium mine partner

Shares of General Motors rose more than 2% in premarket trading Wednesday following an upgrade of the stock by UBS from neutral to buy. The firm also hiked its price target for GM by 45% to $81.

Also likely elevating GM was a Reuters report that the Trump administration is exploring taking a 10% stake in Lithium Americas, the automaker’s partner in a yet to open Thacker Pass lithium mine. Shares of Lithium Americas surged 68% in the premarket.

GM, which invested $625 million into the lithium mine last year, holds a 38% stake in the joint venture. The mine is expected to become the Western Hemispheres primary lithium source in 2028, when it’s slated to open, producing enough of the metal to make 800,000 electric vehicle batteries.

Prior to its plans for Lithium Americas, the Trump administration last month said it would take a 10% stake in Intel. In July, it announced a 15% stake in rare earths miner MP Materials.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.