This team decides if their AI can be used to kill people
When AI companies like OpenAI, Meta, or Anthropic finish training a new AI model, one of the last steps in the process is to put the model in front of “red teams” — experts in biology, nuclear weapons, cybersecurity, and misinformation — who try to get the model to do very bad things.
The Wall Street Journal profiled Anthropic’s in-house red team, which assigns risk scores based on their testing, which the company uses to decide if a model is safe to be released. The Journal story said:
“Nearly two weeks after Anthropic started its latest round of safety evals, there was a smile of qualified relief on Graham’s boyish face. The new Sonnet 3.5 had crept closer to the company’s next threshold for dangerous capabilities, but hadn’t blasted past the red lines. ”
The Wall Street Journal profiled Anthropic’s in-house red team, which assigns risk scores based on their testing, which the company uses to decide if a model is safe to be released. The Journal story said:
“Nearly two weeks after Anthropic started its latest round of safety evals, there was a smile of qualified relief on Graham’s boyish face. The new Sonnet 3.5 had crept closer to the company’s next threshold for dangerous capabilities, but hadn’t blasted past the red lines. ”