AI inference

  • Qualcomm Enters AI Chip Market, Challenging AMD and Nvidia

    Qualcomm is entering the data center AI accelerator market, challenging Nvidia’s dominance with its AI200 and AI250 chips planned for 2026 and 2027. Leveraging its expertise in mobile NPUs, Qualcomm aims to capitalize on the booming AI server market. Qualcomm emphasizes its total cost of ownership benefits and higher memory capacity (768GB per AI card). The company initially focuses on AI inference and offers flexible system configurations. A partnership with Saudi Arabia’s Humain demonstrates Qualcomm’s commitment to the sector.

    2025年11月7日
  • Moving AI Workloads from Cloud to On-Premise: A Strategy for Reducing Power Consumption

    Arm CEO Rene Haas advocates for distributing AI workloads from cloud-based infrastructure to local devices to reduce energy consumption and improve sustainability. He highlights a shift towards hybrid computing, with AI training in the cloud and inference occurring on devices like smartphones and AR glasses. Arm’s expanded partnership with Meta aims to optimize AI efficiency across the entire compute stack, exemplified by localized speech recognition in Meta’s Ray-Ban Wayfarer glasses. This localized processing enhances responsiveness and reduces reliance on cloud servers.

    2025年10月19日
  • First AI GPU Chipset Comparison Report: NVIDIA Dominates, Huawei Surpasses AMD

    A Morgan Stanley report reveals high profitability in AI inference, with average profit margins exceeding 50% for “AI inference factories.” NVIDIA’s GB200 NVL72 leads with a near 78% profit margin, followed by Google’s TPU v6e pod (74.9%) and AWS’s Trn2 UltraServer (62.5%). Huawei’s Ascend CloudMatrix 384 achieves 47.9%. AMD’s MI300X and MI355X, however, show significant negative profit margins due to insufficient token generation efficiency.

    2025年8月16日
  • Huawei’s Upcoming AI Breakthrough: Potential Reduction in HBM Memory Dependency

    Huawei is expected to unveil a significant AI inference technology at an upcoming forum, potentially reducing China’s reliance on High Bandwidth Memory (HBM). HBM is crucial for AI inference due to its high bandwidth and capacity, enabling faster access to large language model parameters. However, HBM supply constraints and export restrictions are pushing Chinese companies to seek alternatives. This innovation could improve the performance of Chinese AI models and strengthen the Chinese AI ecosystem.

    2025年8月9日