XPENG Partners with Peking University on AI Breakthrough, Sharpening Autonomous Driving Capabilities
XPENG, the innovative electric vehicle manufacturer, has joined forces with Peking University to develop FastDriveVLA, a groundbreaking visual token pruning framework. This advanced technology empowers autonomous driving AI to emulate human driving by selectively focusing on essential visual information, significantly reducing computational load by an impressive 7.5 times. The research, detailing this novel approach, has earned acceptance into AAAI 2026, a highly prestigious artificial intelligence conference renowned for its stringent selectivity, with an acceptance rate of a mere 17.6% this year.
This development marks a significant leap forward in XPENG’s pursuit of Level 4 autonomous driving. It highlights the company’s comprehensive, in-house expertise across the entire AI stack for mobility solutions and propels the industry closer to the efficient and scalable deployment of sophisticated autonomous driving systems.
The research paper, titled “FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning,” addresses a critical challenge in the rapidly evolving field of AI-driven autonomous vehicles. As Vision-Language-Action (VLA) models become increasingly central to end-to-end autonomous driving systems due to their adeptness at understanding complex environments and reasoning about actions, the sheer volume of visual tokens they process presents a significant hurdle. Each image is broken down into numerous tokens, which the AI uses to interpret the driving scene and make decisions. However, managing this large data stream increases onboard computational demands, potentially impacting real-time performance and inference speed – crucial factors for safe and responsive autonomous driving.
While previous attempts at visual token pruning, often relying on attention mechanisms or token similarity, have faced limitations in practical driving scenarios, FastDriveVLA offers a fresh perspective. Inspired by the human driver’s innate ability to prioritize salient foreground details while naturally disregarding less critical background elements, the XPENG and Peking University team has devised a reconstruction-based pruning framework.
At its core, FastDriveVLA employs an adversarial foreground-background reconstruction strategy. This sophisticated technique enhances the AI’s capacity to discern and retain the most pertinent visual tokens, effectively filtering out noise and irrelevant data. Rigorous testing on the nuScenes autonomous driving benchmark dataset has demonstrated FastDriveVLA’s state-of-the-art performance across various pruning ratios. Notably, by reducing the number of visual tokens from 3,249 to a mere 812, the framework achieved a remarkable 7.5-fold reduction in computational load without compromising planning accuracy.
This recognition at AAAI 2026 follows another significant achievement for XPENG earlier this year, when the company was the sole Chinese automaker invited to present at CVPR WAD, showcasing advancements in autonomous driving foundation models. Furthermore, XPENG’s AI Day in November unveiled its VLA 2.0 architecture, a pivotal innovation that eliminates the intermediate language translation step, enabling direct Visual-to-Action generation and redefining the traditional V-L-A pipeline.
These cumulative accomplishments underscore XPENG’s robust, full-stack, in-house development capabilities, spanning everything from AI model architecture design and training to model distillation and on-vehicle deployment. The company remains steadfast in its commitment to advancing L4 autonomous driving, aiming to accelerate the integration of physical AI systems into vehicles and ultimately deliver exceptionally safe, efficient, and comfortable intelligent driving experiences to users worldwide.
Original article, Author: Jam. If you wish to reprint this article, please indicate the source:https://aicnbc.com/15083.html