Alibaba’s research division has unveiled two open-source versions of its groundbreaking Wan2.1-VACE video generation model (1.3B and 14B parameters), with the lightweight 1.3B variant capable of running on consumer-grade GPUs. The models are now available on GitHub, Hugging Face, and ModelScope communities.
What sets Wan2.1-VACE apart is its unprecedented multimodal input flexibility, accepting text prompts, images, video clips, segmentation masks, and control signals simultaneously. This allows precise manipulation of character consistency, scene composition, motion trajectories, and movement intensity within generated videos.
The model introduces a “Lego-like” modular design where creators can combine different capabilities without retraining specialized models. For instance:
- Object replacement in videos by merging image reference with subject reshaping
- Portrait-to-landscape video conversion using image reference, frame extension, and background expansion modules
Since February, Alibaba’s Tongyi Wanxiang series has progressively open-sourced text-to-video, image-to-video, and keyframe-to-video models. The project has achieved remarkable traction with 330,000+ downloads and 11,000+ GitHub stars – making it the most adopted video generation framework in its release window.
Industry analysts note this release significantly lowers the hardware barrier for AI video generation while providing professional-grade control – potentially accelerating adoption across indie creators, marketing agencies, and film pre-production pipelines.
Original article, Author: Samuel Thompson. If you wish to reprint this article, please indicate the source:https://aicnbc.com/26.html