Anthropic researchers crack open the black box of how LLMs “think”
OK, first off — LLMs don’t think. They are clever systems that use probabilistic methods to parse language by mapping tokens to underlying concepts via weighted connections. Got it?
But exactly how a model goes from the user’s prompt to “reasoning” a solution is the subject of great speculation.
Models are trained, not programmed, and there are definitely weird things happening inside these tools that humans didn’t build. As the industry struggles with AI safety and hallucinations, understanding this process is key to developing trustworthy technology.
Researchers at the AI startup Anthropic have devised a way to perform a “circuit trace” that allows them to dissect the pathways that a model chooses between concepts on its journey to devising an answer to the prompt it was given. Their paper sheds new light on this mysterious process, much like a real-time fMRI brain scan can show which parts of the human brain “light up” in response to different stimuli.
Some of the interesting findings:
Language appears to be independent from concepts — it’s trivial for the model to parse a query in one language and answer in another. The French “petit” and English “small” map to the same concept.
When “reasoning,” sometimes the model is just bullshitting you. Researchers found that sometimes the “chain of thought” that an end user sees does not actually reflect the processes at work inside the model.
Models have created novel ways to solve math problems. Watching exactly how the model solved simple math problems showed some weird techniques that humans have definitely never learned in school.
Anthropic made a helpful video that describes the research clearly:
Anthropic is working hard to catch up to industry leader OpenAI as it seeks to grow revenues to cover the expensive computing resources needed to offer its services. Amazon has invested $8 billion in the company, and Anthropic’s Claude model will be used to power parts of the AI-enhanced Alexa.
Models are trained, not programmed, and there are definitely weird things happening inside these tools that humans didn’t build. As the industry struggles with AI safety and hallucinations, understanding this process is key to developing trustworthy technology.
Researchers at the AI startup Anthropic have devised a way to perform a “circuit trace” that allows them to dissect the pathways that a model chooses between concepts on its journey to devising an answer to the prompt it was given. Their paper sheds new light on this mysterious process, much like a real-time fMRI brain scan can show which parts of the human brain “light up” in response to different stimuli.
Some of the interesting findings:
Language appears to be independent from concepts — it’s trivial for the model to parse a query in one language and answer in another. The French “petit” and English “small” map to the same concept.
When “reasoning,” sometimes the model is just bullshitting you. Researchers found that sometimes the “chain of thought” that an end user sees does not actually reflect the processes at work inside the model.
Models have created novel ways to solve math problems. Watching exactly how the model solved simple math problems showed some weird techniques that humans have definitely never learned in school.
Anthropic made a helpful video that describes the research clearly:
Anthropic is working hard to catch up to industry leader OpenAI as it seeks to grow revenues to cover the expensive computing resources needed to offer its services. Amazon has invested $8 billion in the company, and Anthropic’s Claude model will be used to power parts of the AI-enhanced Alexa.