Multimodal AI

  • Baidu ERNIE Outperforms GPT and Gemini in Multimodal AI Benchmarks

    Baidu’s new ERNIE-4.5 model rivals GPT and Gemini in multimodal AI, focusing on enterprise data, including visual formats like schematics and video. Its lightweight architecture activates only 3 billion parameters, reducing inference costs. ERNIE excels at interpreting non-textual data, solving complex visual problems, and automating tasks. Benchmarks show competitive performance in visual question answering. ERNIE aims to bridge the gap from perception to automation, enabling structured data extraction from visuals and integration with business systems, though substantial hardware is required. It’s available under the Apache 2.0 license.

    2025年12月1日
  • Academician Zheng Weimin Urges Accelerated Development of Domestic CUDA-Compatible Platforms

    At the 2025 Sohu Tech Summit, academician Zheng Weimin highlighted AI trends, emphasizing China’s focus on multimodal large models and strategic AI deployment in GDP-critical sectors like manufacturing, finance, and healthcare. He addressed the paradox of relying on NVIDIA’s GPU ecosystem amid export restrictions and chip shortages, while domestic developers advance hardware alternatives but struggle with software fragmentation. Zheng proposed a dual strategy: creating a “pseudo-CUDA” environment to ease transitions and prioritizing hardware benchmarks despite late entry. He argued that achieving 60-80% of international performance standards, paired with localized optimization, could drive adoption in key areas like vision and speech processing, allowing China to bypass traditional tech dominance through targeted interoperability amid tightening global data policies.

    2025年5月18日