AI Training Data
-
EU Antitrust Probe Targets Google’s Use of Online Content for AI
The EU has launched an antitrust probe into Google’s use of copyrighted web and YouTube content to train its AI models, examining whether the firm imposes unfair terms on publishers, grants itself privileged data access, and disadvantages rival AI developers. The investigation could lead to mandatory licensing schemes that compensate creators. It follows recent EU enforcement actions against U.S. tech firms, including fines on X and a probe into Meta’s WhatsApp data use, highlighting growing regulatory scrutiny of Big Tech’s data practices in Europe.
-
AI Data Poisoning Alert: 0.01% Fake Training Text Can Increase Harmful Content by 11.2%
China’s Ministry of State Security warns of “data poisoning” as a critical threat to AI. Inaccurate, fabricated, and biased data corrupt AI training datasets, leading to flawed models and security risks. Even minimal data contamination (0.01% fabricated text) can significantly increase harmful content generation (11.2%). The proliferation of AI-generated content further amplifies the issue, creating a “post-contamination legacy”. Authorities highlight dangers in finance, public safety, and healthcare, where data manipulation can trigger market volatility, social panic, and incorrect medical advice.