ByteDance Unveils End-to-End Simultaneous Interpretation Model: Near-Human Accuracy with 3-Second Latency

ByteDance launched Seed LiveInterpret 2.0, an end-to-end simultaneous interpretation model excelling in Chinese-English translation. It offers ultra-low latency (2-3 seconds) and accuracy rivaling human interpreters, exceeding 70% in complex scenarios and 80% in single-person speeches. Key features include zero-shot voice cloning and intelligent balancing of translation quality, latency, and speech cadence. Evaluations show it surpasses other systems, marking a significant advancement in AI translation.

“`html

CNBC AI News, July 24th — ByteDance today announced the official launch of its end-to-end simultaneous interpretation model, Seed LiveInterpret 2.0.

According to ByteDance, this marks a significant leap forward in AI-powered translation. This system boasts extremely low latency while maintaining state-of-the-art (SOTA) translation quality in Chinese-English simultaneous interpretation.

The model is built upon a full-duplex end-to-end speech generation and understanding framework, facilitating bidirectional Chinese-English translation.

It processes multi-person speech input in real-time, mirroring the “listen and speak” capability of human interpreters with minimal delay. The system can simultaneously receive source language audio and output translated speech in the target language.

字节跳动正式发布端到端同声传译模型:准确率接近真人 3秒延迟

Furthermore, Seed LiveInterpret 2.0 supports zero-shot voice cloning, promising more natural and seamless communication.

Currently, the model primarily focuses on Chinese-English translation.

Here’s how Seed LiveInterpret 2.0 sets itself apart from traditional machine interpretation systems:

Translation Accuracy Approaching Human-Level Interpretation

The system achieves over 70% accuracy in bidirectional Chinese-English translation in complex scenarios like multi-person conferences, and exceeds 80% accuracy in single-person speeches, rivaling professional human interpreters.

Ultra-Low Latency “Listen and Speak” Capability

The translation delay can be as low as 2-3 seconds, a reduction of over 60% compared to traditional machine interpretation systems.

Zero-Shot Voice Cloning

By sampling real-time speech signals, the system can extract voice characteristics and “speak” the foreign language in the speaker’s own voice.

Intelligent Balancing of Translation Quality, Latency, and Speech Output Cadence

The model intelligently adjusts the output pace based on speech clarity, fluency, and complexity, adapting to the nuances of different languages.

Model evaluation results indicate that in speech-to-text simultaneous interpretation tasks, Seed LiveInterpret 2.0 achieved an average human evaluation score of 74.8 for Chinese-English translation quality (assessing translation accuracy, with a maximum score of 100), surpassing the second-ranked benchmark system (47.3 points) by 58%.

字节跳动正式发布端到端同声传译模型:准确率接近真人 3秒延迟

In speech-to-speech tasks, where only three systems in the industry support this capability, Seed LiveInterpret 2.0 achieved an average translation quality score of 66.3 for Chinese-English translation (evaluating translation accuracy, speech output latency, speech rate, pronunciation, fluency, and other indicators, with a maximum score of 100), significantly exceeding other benchmark systems and approaching the level of professional human simultaneous interpretation. This marks a pivotal step, and raises the bar for future developments within the AI translatation space.

Furthermore, most benchmark systems do not support the voice cloning function.

Regarding latency performance, Seed LiveInterpret 2.0 has an average first-word output latency of only 2.21 seconds in speech-to-text scenarios and only 2.53 seconds in speech-to-speech scenarios, striking a balance between translation quality and real-time performance. These results signal a technological advancement that could fundamentally alter the global communications landscape.

字节跳动正式发布端到端同声传译模型:准确率接近真人 3秒延迟

“`

Original article, Author: Tobias. If you wish to reprint this article, please indicate the source:https://aicnbc.com/5530.html

Like (0)
Previous 1 day ago
Next 1 day ago

Related News