Online harms
-
OpenAI Unveils Safety Models for Wider Harm Classification
OpenAI released two reasoning models, gpt-oss-safeguard-120b and gpt-oss-safeguard-20b, designed to help developers classify and mitigate online harms. These “open-weight” models allow organizations to tailor them to specific policies and understand their decision-making, enhancing online safety. Developed in collaboration with ROOST, Discord, and SafetyKit, the models aim to address ethical concerns surrounding rapidly scaling AI and promote responsible AI development. They are available for download via Hugging Face.