While AI spending in the Asia Pacific region continues on an upward trajectory, many businesses are finding it challenging to extract tangible value from their investments. A significant contributing factor, according to industry analysis, lies in the limitations of the underlying infrastructure, particularly its inability to support the speed and scale demands of real-world AI applications. Numerous projects reportedly fall short of their expected return on investment, even after substantial allocations to Generative AI tools, underscoring the critical role of infrastructure.
This performance and cost disparity showcases the profound impact of AI infrastructure on the performance, cost-effectiveness, and scalability of AI deployments in the region.
Akamai, in collaboration with NVIDIA, is attempting to tackle this challenge with its “Inference Cloud,” powered by NVIDIA’s latest Blackwell GPUs. This approach is predicated on the idea that AI’s decision-making processes should be situated closer to the end-user, rather than relying on distant data centers, especially given the time-sensitive nature of most AI applications. Akamai posits that a decentralized approach can significantly reduce costs, minimize latency, and facilitate AI services that require instantaneous responses.
Jay Jenkins, CTO of Cloud Computing at Akamai, recently discussed the impetus behind this shift and addressed why the inference stage, as opposed to training, has emerged as the principal bottleneck. He elaborated on the challenges enterprises face in scaling AI projects effectively.
The Infrastructure Bottleneck: Why AI Projects Stumble
Jenkins emphasizes the considerable chasm that exists between AI experimentation and full-scale deployment. “Many AI initiatives fail to deliver on expected business value because enterprises often underestimate the gap between experimentation and production,” he stated. Despite the widespread interest in GenAI, challenges such as substantial infrastructure costs, high latency, and difficulties in achieving model scalability frequently impede progress. This can be especially true in situations where enterprises opt for a multi-cloud configuration.
While most organizations rely on centralized cloud solutions and extensive GPU clusters, Jenkins argues that these models become prohibitively expensive as usage increases, particularly in regions geographically distant from major cloud hubs. Latency is also a significant concern when models are required to undergo multiple inference steps across long distances. “AI is only as powerful as the infrastructure and architecture it runs on,” Jenkins asserts, emphasizing that latency can significantly degrade the user experience and diminish the intended business value. He also highlighted the growing complexities of data regulations and compliance requirements as major obstacles slowing the transition from pilot projects to full-scale production.
The Shifting Focus: Inference Takes Center Stage
As AI adoption in the Asia Pacific region evolves from pilot programs to operational deployments within applications and services, Jenkins notes a consequential shift in computing power consumption. The continuous demands of day-to-day inference, as opposed to periodic training cycles, are now the primary driver. With many organizations deploying language, vision, and multimodal models across diverse markets, the demand for rapid and reliable inference is outpacing initial projections. Complex data environments, regulatory requirements, and varied languages further complicate the landscape in the region. Consequently, existing centralized systems, which were not initially designed to handle this level of responsiveness, are facing unprecedented strain.
Edge Infrastructure: Enhancing AI Performance and Cost Efficiency
According to Jenkins, decentralizing inference, by moving it closer to users, devices, or agents, can fundamentally alter the cost dynamics and improve end-user experiences. This approach reduces the distance data needs to travel, resulting in faster model responses. Further, it mitigates the costs associated with transmitting large volumes of data between major cloud hubs.
Furthermore, physical AI systems, such as robots, autonomous machines, and smart city technologies, require instantaneous decision-making capabilities. Remote inference can significantly hinder the performance of these systems.
Akamai’s analysis indicates that enterprises in India and Vietnam can achieve substantial cost reductions with image-generation models by deploying workloads at the edge rather than relying on centralized cloud solutions. The advantages gained stem from improved GPU utilization and lower egress fees.
Early Adoption of Edge-Based AI
The demand for edge inference is most pronounced in industries where minimal delays can significantly impact revenue, safety, or user engagement, with retail and e-commerce leading the way. Shoppers are prone to abandoning slow or non-responsive online experiences. Localized and accelerated inference significantly enhances the performance of personalized recommendations, search functionality, and multimodal shopping tools.
The financial sector is another area where latency directly influences value. Workloads such as fraud detection, payment authorization, and transaction scoring rely on rapid, sequential AI decisions. Executing inference closer to the point of data creation enables financial institutions to operate more efficiently and maintain data compliance within regulatory boundaries.
The Rise of Cloud and GPU Partnerships
As AI workloads expand, organizations require infrastructure solutions capable of keeping pace with demand. This has propelled cloud providers and GPU manufacturers into closer collaborations, as illustrated by Akamai’s partnership with NVIDIA. This alliance aims to deploy GPUs, DPUs, and AI software across thousands of edge locations.
The objective is to establish an “AI delivery network” that distributes inference across numerous sites, rather than concentrating it in a few locations. This approach improves performance and facilitates compliance. Jenkins notes that many organizations operating in the APAC region face challenges stemming from disparate data regulations across different markets, underscoring the importance of localized processing. Emerging partnerships are shaping the future direction of AI infrastructure in the region, particularly for workloads demanding low-latency performance.
Jenkins highlights the integration of security measures into these systems from the outset. Zero-trust controls, data-aware routing, and fraud and bot protection are becoming standard components of the offered technology stacks.
Enabling Agentic AI and Automation
The performance of agentic systems, which involve sequential decision-making, hinges on infrastructure capable of operating at millisecond speeds. Jenkins acknowledges that the region’s diversity presents implementation challenges. Differing levels of connectivity, regulatory frameworks, and technical maturity across countries necessitate flexible AI workloads capable of adapting to the most suitable location. According to internal research, the cloud still has a dominant position, but most enterprises within the Asia-Pacific region are likely to rely upon edge services by 2027. Consequently, future infrastructure solutions must be capable of storing data locally, routing tasks to the nearest appropriate location, and maintaining functionality even in unstable network environments.
Future Considerations for Enterprises
The shift of inference to the edge will necessitate new operational management strategies. Jenkins advises organizations to prepare for a more distributed AI lifecycle, characterized by model updates across multiple sites. This requires enhanced orchestration and extensive visibility into performance, costs, and errors across both core and edge systems.
Data governance becomes more complex but also more manageable with localized processing. Given that regulatory variance is a significant challenge, deploying inference closer to the data’s origin can help enterprises maintain compliance.
Security demands increased attention. While edge inference can enhance resilience, it also mandates securing every site. Enterprises must safeguard APIs, data pipelines, and implement defenses against fraud and bot attacks, emphasizing that financial institutions in particular will require robust security controls for decentralized data processing.
Original article, Author: Samuel Thompson. If you wish to reprint this article, please indicate the source:https://aicnbc.com/13491.html