Visual Data Analysis
-
Baidu ERNIE Outperforms GPT and Gemini in Multimodal AI Benchmarks
Baidu’s new ERNIE-4.5 model rivals GPT and Gemini in multimodal AI, focusing on enterprise data, including visual formats like schematics and video. Its lightweight architecture activates only 3 billion parameters, reducing inference costs. ERNIE excels at interpreting non-textual data, solving complex visual problems, and automating tasks. Benchmarks show competitive performance in visual question answering. ERNIE aims to bridge the gap from perception to automation, enabling structured data extraction from visuals and integration with business systems, though substantial hardware is required. It’s available under the Apache 2.0 license.