GAIA Benchmark
-
“What Can Manus Accomplish in Just 7 Minutes?”
Amid U.S. export restrictions on NVIDIA’s AI chips to China, Chinese startup Manus has launched its open-access general-purpose AI agent globally. Developed by Beijing Butterfly Effect Technology and led by Xiao Hong, Manus uses a tripartite autonomous system (Planner, Executor, Validator) to execute complex workflows across 60+ domains, outperforming rivals on benchmarks. After initial invite-only access caused code resales to surge, the platform now offers cloud-based task processing despite mixed CNBC test results in content generation and multimedia production. Analysts highlight its strategic debut during tech trade tensions, positioning it as a potential disruptor in the global AI market through cognitive automation, bridging conceptualization and execution in human-AI collaboration.
-
ChatGPT Unveils Agentic Features to Revolutionize Complex Research Execution
OpenAI launched Deep Research, an agentic AI feature enhancing ChatGPT’s multi-stage analytical workflows by autonomously synthesizing vetted online sources. Operational in complex domains like financial modeling and supply chain risk, it achieved 26.6% problem-solving accuracy across 3,000 cross-disciplinary questions (vs. 9.4% for competitors) and 72.57% on the GAIA benchmark. While outperforming prior models in rigor and documentation, limitations persist in resolving conflicting data and probabilistic reasoning. Initially available to Pro-tier users, deployment excludes EU jurisdictions, raising compliance concerns for sectors requiring calibrated confidence thresholds.