AI Werewolf

Markets

7 AI Werewolves: GPT-5 Dominates, Kimi’s Aggressive Tactics

In a benchmark test simulating social dynamics, seven LLMs played the game Werewolf. GPT-5 significantly outperformed the others with a 96.7% win rate, demonstrating superior strategic thinking and manipulation skills. Other models, including Qwen3 and Kimi-K2, showed respectable performance. Analysis revealed distinct personality traits in each model; for example, Kimi-K2 exhibited aggressive tactics. The experiment highlights the importance of social skills for AI agents operating within human teams, alongside traditional benchmarks.

2025年9月2日