AI Auditing
-
Anthropic Uses AI Agents to Audit Models for Safety
Anthropic is using AI agents to audit and improve the safety of its AI models, like Claude. This “digital detective squad” includes Investigator, Evaluation, and Red-Teaming Agents that identify vulnerabilities and potential harms proactively. These agents have successfully uncovered hidden objectives, quantified existing problems, and exposed dangerous behaviors in AI models. While not perfect, these AI safety agents help humans focus on strategic oversight and pave the way for automated AI monitoring as systems become more complex.