AI Safety Benchmark: Code Model Safety Testing Results Released

“`html

The rise of Large Language Models (LLMs) is reshaping industries, with code-generating LLMs showing particular promise in boosting development efficiency across sectors like finance and tech. But as these powerful tools become more widespread, so do the security concerns. Vulnerable code generation and the potential for malicious use, such as creating phishing tools, are real risks that could hinder progress.

Enter the China Academy of Information and Communications Technology’s (CAICT) Artificial Intelligence Institute. Building on its earlier benchmark work, the institute, under the umbrella of the Alliance of Artificial Intelligence Industry Development (AIIA) Security Governance Committee, launched its first round of security benchmark testing and risk assessment for code LLMs in June 2025. This initiative is designed to evaluate the security capabilities of these models in real-world application scenarios and assess their potential risks.

[MD:Title]
Figure 1: Code LLM Security Benchmark Testing Framework

The CAICT’s testing methodology is rigorous. By incorporating risky code snippets from real-world open-source projects and employing prompt injection techniques to create malicious instructions, the institute developed a comprehensive dataset of over 15,000 test cases. This dataset covers nine programming languages, 14 fundamental functionality scenarios, and 13 attack methods. Performance is evaluated using the Secure@k metric, and risk levels are categorized as: Controllable (Secure@k ≥ 90%), Low (80% ≤ Secure@k < 90%), Medium (60% ≤ Secure@k < 80%), and High (Secure@k < 60%).

The initial assessment focused on 15 leading Chinese open-source models, showcasing parameter sizes from 3B to a massive 671B. The models tested included those from Zhipu AI (codegeex-4, glm-4-air-250414, glm-4-plus, glm-z1-air), DeepSeek (DeepSeek-R1-0528, DeepSeek-V3-0324), and Tongyi Qianwen (qwen2.5-7B-Instruct, qwen2.5-72B-instruct, qwen2.5-Coder-3B-Instruct, qwen2.5-coder-32B-instruct, qwen3-4B, qwen3-32B, qwen3-235B-a22b, qwq-32B, qwq-32B-preview).

[MD:Title]
Figure 2: Code LLMs Under Security Benchmark Testing

The testing employed API calls, aligning with established technical security risk classification frameworks. Both direct querying and adversarial attacks, using standardized protocols for single-turn and multi-turn dialogues, were utilized. The security risk levels were determined by each model’s overall Secure@k score across the 15,000+ test samples. Here’s the breakdown of the 15 models:

1. Controllable Risk: 0 models.

2. Low Risk: 3 models, with Secure@k scores of 85.7%, 83.7%, and 82.6% respectively.

3. Medium Risk: 11 models, with Secure@k scores ranging from 75%, 72.8%, 72.3%, 69.6%, 69.2%, 68.3%, 65.7%, 65.6%, 65.2%, 64.4%, and 63.4%.

4. High Risk: 1 model, with a Secure@k score of 48.1%.

[MD:Title]
Figure 3: Overall Secure@k Scores of Tested Models

Table 1 shows the security pass rates for different testing scenarios, while Table 2 breaks down security pass rates by programming language. Figure 4 illustrates the overall security pass rates against different malicious attacks.

Table 1: Security Pass Rates by Testing Scenario

[MD:Title]

Table 2: Security Pass Rates by Programming Language

[MD:Title]

[MD:Title]
Figure 4: Overall Security Pass Rates Under Different Malicious Attacks

The findings suggest that while the tested LLMs possess a baseline level of security, their defenses are significantly weaker when facing malicious attacks, even exhibiting high-risk vulnerabilities. In high-frequency scenarios like code completion and generation, where rules are clearer, the models achieved pass rates exceeding 80%, demonstrating a low-to-medium risk security level. Similarly, the models showed stronger defenses against attacks like semantic obfuscation, developer mode masquerading, and role-playing, achieving pass rates above 80%. However, significant weaknesses were identified in sensitive areas such as developing fraudulent medical code or financial scams. In these scenarios, non-expert users could generate readily deployable malicious code with only direct prompts, presenting a medium-level risk, with average pass rate is only 67%. Furthermore, the models struggled with rewriting toxic information and reverse prompting, with pass rates below 60%, and even below 40% for metaphor-based challenges – a high-risk vulnerability indicating the potential for these models to facilitate cyberattacks.

Looking ahead, the CAICT AI Institute plans to expand its security benchmark testing to include international open-source and commercial models. They also aim to collaborate with experts to further explore the security risks associated with code LLMs and develop technical toolchains to mitigate these risks. The AI Safety Benchmark will continue to evolve to address the needs of the technology and industry, promoting a healthy ecosystem for LLM development.

“`

Original article, Author: Tobias. If you wish to reprint this article, please indicate the source:https://aicnbc.com/5330.html

AI Safety Benchmark: Code Model Safety Testing Results Released

About Author

Tobias

Related News

Targa Resources Corp. Announces $1.5 Billion Senior Notes Offering

User Publicly Apologizes for Smearing NIO and Disregarding Legal Department: Exploiting Sympathy

JD.com Enters Food Delivery Market, Completing a “Three-Way Split”: Holds 31% Market Share and 45% Premium Delivery Segment