Re-architecting for Advantage: Huawei’s AI Stack

Huawei is making a bold move in the AI computing landscape with the release of its CloudMatrix 384 AI chip cluster, a system designed to accelerate AI learning. According to Huawei, the system leverages clusters of its Ascend 910C processors, interconnected via high-speed optical links. This distributed architecture offers the potential to surpass traditional GPU-based setups in certain areas, particularly in resource utilization and on-chip processing time, even though individual Ascend chips may not match the raw power of competitors’ offerings.

This push from Huawei represents a significant challenge to Nvidia’s dominant position in the AI accelerator market. The company asserts that CloudMatrix 384 positions it as a “formidable challenger to Nvidia’s market-leading position, despite ongoing US sanctions.” However, the transition to Huawei’s ecosystem isn’t without its challenges.

For data scientists and AI engineers, adopting the Huawei framework entails adapting existing workflows to leverage tools and frameworks that are optimized for the Ascend processors. Specifically, this involves embracing MindSpore, Huawei’s proprietary deep learning framework, along with the CANN (Compute Architecture for Neural Networks) software stack.

Framework Transition: From PyTorch/TensorFlow to MindSpore

Unlike Nvidia’s ecosystem, which thrives on widely adopted frameworks like PyTorch and TensorFlow that are meticulously engineered to exploit the power of CUDA, Huawei’s Ascend processors are designed for optimal performance with MindSpore. This presents a potential hurdle for developers deeply entrenched in the Nvidia ecosystem.

Data engineers accustomed to building models in PyTorch or TensorFlow will likely need to convert their models to the MindSpore format or retrain them using the MindSpore API. This process isn’t a simple one-to-one translation.

MindSpore employs a different syntax, training pipelines, and function calls compared to PyTorch and TensorFlow. Replicating the results achieved with existing model architectures and training pipelines may necessitate a significant degree of re-engineering. Subtle differences exist even in fundamental operations, such as padding modes in convolution and pooling layers, as well as variations in default weight initialization methods, demanding careful attention during the migration process.

Using MindIR for Model Deployment

A key component in MindSpore is MindIR (MindSpore Intermediate Representation), a framework that bears similarities to Nvidia’s NIM. After a model is trained in MindSpore, it can be exported using the `mindspore.export` utility, which converts the trained network into the MindIR format. This intermediate representation facilitates efficient deployment across various hardware platforms supported by MindSpore.

Deployment typically involves loading the exported MindIR model and then executing predictions using MindSpore’s inference APIs tailored for Ascend chips. These APIs handle crucial tasks such as model de-serialization, memory allocation, and optimized execution on the Ascend hardware.

Notably, MindSpore enforces a clearer separation between training and inference logic compared to PyTorch or TensorFlow. Consequently, all preprocessing steps must meticulously match the training inputs, and static graph execution must be carefully optimized. To further enhance performance, particularly on specific hardware configurations, Huawei recommends leveraging MindSpore Lite or exploring the Ascend Model Zoo for pre-optimized models and hardware-specific tuning strategies.

Adapting to CANN (Compute Architecture for Neural Networks)

CANN represents Huawei’s dedicated software stack, featuring a comprehensive suite of tools and libraries designed for Ascend hardware. It holds a similar functional role as NVIDIA’s CUDA. Huawei advocates employing CANN’s profiling and debugging tools to meticulously monitor and refine model execution on Ascend hardware, allowing developers to pinpoint bottlenecks and optimize performance effectively.

Execution Modes: GRAPH_MODE vs.PYNATIVE_MODE

MindSpore offers two primary execution modes:

GRAPH_MODE – Pre-compiles the entire computation graph before execution. This approach enables faster execution and potential performance optimizations because the graph can be analyzed and optimized during compilation.
PYNATIVE_MODE – Executes operations immediately as they are encountered. This leads to simplified debugging workflows and is generally better suited for the early stages of model development due to its fine-grained error tracking capabilities.

For initial development cycles, PYNATIVE_MODE is generally recommended due to its simplified iterative testing and debugging capabilities. However, when models are nearing deployment, switching to GRAPH_MODE can unlock maximum efficiency on Ascend hardware. This choice allows engineering teams to strategically balance development flexibility with production performance.

It’s crucial to adapt code based on the chosen execution mode. For instance, when operating in GRAPH_MODE, it’s generally best practice to minimize the use of Python-native control flow to maximize graph optimization potential.

Deployment Environment: Huawei ModelArts

Huawei’s ModelArts, its cloud-based AI development and deployment platform, has been engineered for close integration with Huawei’s Ascend hardware and the MindSpore framework. It provides a comprehensive suite of tools, mirroring platforms such as AWS SageMaker and Google Vertex AI, but with specific optimizations for Huawei’s AI processors.

Huawei emphasizes that ModelArts supports the entire AI development lifecycle, encompassing everything from data labeling and preprocessing, to model training, deployment, and continuous monitoring. Each stage is readily accessible through both APIs and a user-friendly web interface.

In Summary

The transition to MindSpore and CANN requires investment in training and adaptation. Data engineers will need to acquire familiarity with new processes, including how CANN handles model compilation and optimization for Ascend hardware, adapting existing tooling and automation pipelines designed for NVIDIA GPUs, and mastering the specific APIs and workflows of MindSpore. The process may require retraining personnel and re-architecting existing codebases.

While Huawei’s ecosystem is continually evolving, it currently lacks the maturity, robustness, and extensive community support that frameworks like PyTorch with CUDA possess. However, Huawei anticipates that the advantages of migrating to its processes and infrastructure will ultimately outweigh the initial challenges. The company hopes this strategy will enable organizations to reduce their reliance on US-based Nvidia in the long run.

Huawei’s Ascend processors offer compelling performance characteristics tailored for AI workloads. Huawei provides extensive documentation, support resources, and migration guides to assist in the transition process.

Original article, Author: Samuel Thompson. If you wish to reprint this article, please indicate the source:https://aicnbc.com/11671.html

Re-architecting for Advantage: Huawei’s AI Stack

Framework Transition: From PyTorch/TensorFlow to MindSpore

Using MindIR for Model Deployment

Adapting to CANN (Compute Architecture for Neural Networks)

Execution Modes: GRAPH_MODE vs.PYNATIVE_MODE

Deployment Environment: Huawei ModelArts

In Summary

About Author

Samuel Thompson

Related News

NVIDIA GPUs Fuel Oracle’s Next-Gen Enterprise AI Services

Google’s AI Agent Automates Vulnerability Fixes by Rewriting Code

Christian Spindeldreher, Dell Technologies: Scaling AI Power