Huawei Ascend 384 Supernode Debuts, Outperforming NVIDIA and AMD’s Previous Generation

The 2025 WAIC features Huawei’s debut of the Atlas 900 A3 SuperPoD, built on the Ascend 384 Super Node. This super-node utilizes advanced bus technology for high-bandwidth, low-latency interconnection between 384 NPUs, addressing communication bottlenecks in large AI clusters. Huawei’s CloudMatrix 384 (CM384) AI cluster, built around Ascend chips, delivers 300 PFLOPs of dense BF16 compute power, reportedly surpassing NVIDIA’s GB200 NVL72. Analysts suggest Huawei’s scaled solution surpasses current market offerings from NVIDIA and AMD.

“`html

CNBC AI News, July 26th – The 2025 World Artificial Intelligence Conference (WAIC) kicked off with a bang today at the Shanghai World Expo Exhibition & Convention Center. A major highlight: Huawei is showcasing its Ascend 384 Super Node, officially named the Atlas 900 A3 SuperPoD, in person for the first time.

【配现场组图】华为昇腾384超节点正式亮相 强势碾压英伟达、AMD一代

【配现场组图】华为昇腾384超节点正式亮相 强势碾压英伟达、AMD一代

Built on a super-node architecture, the Atlas 900 A3 SuperPoD leverages advanced bus technology to achieve high-bandwidth, low-latency interconnection between 384 NPUs. This effectively addresses the communication bottlenecks that often plague inter-resource communication between compute and storage within large-scale clusters.

Furthermore, system-level optimizations ensure efficient resource scheduling, enabling the super-node to operate with the stability and predictability of a single, unified computing entity.

【配现场组图】华为昇腾384超节点正式亮相 强势碾压英伟达、AMD 一代

【配现场组图】华为昇腾384超节点正式亮相 强势碾压英伟达、AMD 一代

Huawei previously unveiled its Ascend super-node concept at the Kunpeng Ascend Developer Conference in May, showcasing its ability to achieve high-speed bus interconnection across a record-breaking 384 cards.

The Ascend super-node boasts significant advantages, including ultra-high bandwidth, ultra-low latency, and exceptional overall performance, supporting a wide range of training and inference workloads.

Its innovative architecture is specifically designed to meet the demanding requirements of model training and inference, providing the low-latency, high-bandwidth, and long-term reliability needed for cutting-edge AI applications.

According to official statements, Huawei’s CloudMatrix 384 (CM384) AI cluster solution is built around 384 Ascend chips, leveraging a fully interconnected topology for highly efficient inter-chip collaboration.

This solution delivers a staggering 300 PFLOPs of dense BF16 compute power, reportedly approaching double the performance of NVIDIA’s GB200 NVL72 system.

【配现场组图】华为昇腾384超节点正式亮相 强势碾压英伟达、AMD 一代

【配现场组图】华为昇腾384超节点正式亮相 强势碾压英伟达、AMD 一代

The CM384 also boasts significantly more memory and bandwidth, with a total memory capacity 3.6 times greater and memory bandwidth 2.1 times greater than NVIDIA’s comparable solutions, providing more efficient hardware support for large-scale AI training and inference tasks.

【配现场组图】华为昇腾384超节点正式亮相 强势碾压英伟达、AMD 一代

While the performance of a single Ascend chip is approximately one-third that of an NVIDIA Blackwell architecture GPU, Huawei’s scaled-out system design has enabled a significant leap in overall compute capabilities, offering greater competitiveness in ultra-large-scale model training and real-time inference scenarios.

The consensus from some international investment banks is that Huawei’s scaled solution “leads NVIDIA and AMD’s current market offerings by a generation.” They believe that China’s breakthrough in AI infrastructure will have a profound impact on the global AI landscape.

“`

Original article, Author: Tobias. If you wish to reprint this article, please indicate the source:https://aicnbc.com/5706.html

Like (0)
Previous 10 hours ago
Next 9 hours ago

Related News