Code Generation
-
AI Safety Benchmark: Code Model Safety Testing Results Released
CAICT’s AI Institute launched security benchmark testing for code-generating LLMs, assessing risks and capabilities using a dataset of 15,000+ test cases across nine languages and various attack methods. The initial assessment of 15 Chinese models (3B-671B parameters) revealed varied security levels, with most exhibiting medium risk. Models showed weaknesses in scenarios involving malicious intent, highlighting vulnerabilities to cyberattacks. CAICT plans to expand testing to international models and develop mitigation tools, aiming to promote a secure LLM ecosystem.
-
OpenAI Unveils Powerful New ChatGPT Agent: Capable of Coding, Creating Presentations, and Analyzing Finance
OpenAI has launched ChatGPT Agent, a unified AI agent integrating web interaction, information gathering, and advanced conversational abilities. Powered by modules for web automation, in-depth research, and an enhanced GPT-4 dialogue engine, it can perform complex tasks like financial research, presentation creation, and code generation. While limitations exist in complex modeling and non-English text analysis, the agent is available to Pro, Plus, and Team subscribers, with future plans for voice interaction and expansion into healthcare and education.