Huawei Ascend 384 Supernode Debuts, Outperforming NVIDIA and AMD’s Previous Generation

“`html

CNBC AI News, July 26th – The 2025 World Artificial Intelligence Conference (WAIC) kicked off with a bang today at the Shanghai World Expo Exhibition & Convention Center. A major highlight: Huawei is showcasing its Ascend 384 Super Node, officially named the Atlas 900 A3 SuperPoD, in person for the first time.

Built on a super-node architecture, the Atlas 900 A3 SuperPoD leverages advanced bus technology to achieve high-bandwidth, low-latency interconnection between 384 NPUs. This effectively addresses the communication bottlenecks that often plague inter-resource communication between compute and storage within large-scale clusters.

Furthermore, system-level optimizations ensure efficient resource scheduling, enabling the super-node to operate with the stability and predictability of a single, unified computing entity.

Huawei previously unveiled its Ascend super-node concept at the Kunpeng Ascend Developer Conference in May, showcasing its ability to achieve high-speed bus interconnection across a record-breaking 384 cards.

The Ascend super-node boasts significant advantages, including ultra-high bandwidth, ultra-low latency, and exceptional overall performance, supporting a wide range of training and inference workloads.

Its innovative architecture is specifically designed to meet the demanding requirements of model training and inference, providing the low-latency, high-bandwidth, and long-term reliability needed for cutting-edge AI applications.

According to official statements, Huawei’s CloudMatrix 384 (CM384) AI cluster solution is built around 384 Ascend chips, leveraging a fully interconnected topology for highly efficient inter-chip collaboration.

This solution delivers a staggering 300 PFLOPs of dense BF16 compute power, reportedly approaching double the performance of NVIDIA’s GB200 NVL72 system.

The CM384 also boasts significantly more memory and bandwidth, with a total memory capacity 3.6 times greater and memory bandwidth 2.1 times greater than NVIDIA’s comparable solutions, providing more efficient hardware support for large-scale AI training and inference tasks.

While the performance of a single Ascend chip is approximately one-third that of an NVIDIA Blackwell architecture GPU, Huawei’s scaled-out system design has enabled a significant leap in overall compute capabilities, offering greater competitiveness in ultra-large-scale model training and real-time inference scenarios.

The consensus from some international investment banks is that Huawei’s scaled solution “leads NVIDIA and AMD’s current market offerings by a generation.” They believe that China’s breakthrough in AI infrastructure will have a profound impact on the global AI landscape.

“`

Original article, Author: Tobias. If you wish to reprint this article, please indicate the source:https://aicnbc.com/5706.html

Huawei Ascend 384 Supernode Debuts, Outperforming NVIDIA and AMD’s Previous Generation

About Author

Tobias

Related News

Gray Media to Broadcast Saints Preseason Game in Native HDR via NEXTGEN TV

Azincourt Energy Launches Initial Work Program at Harrier Uranium Project

Crown Electrokinetics Corp. Intends to Voluntarily Delist from Nasdaq