Sparse Attention

  • .DeepSeek V3.2 Achieves GPT‑5‑Level Performance While Cutting Training Costs by 90%

    .DeepSeek’s new V3.2 model matches OpenAI’s upcoming GPT‑5 on reasoning benchmarks while using a fraction of the training FLOPs, thanks to its Sparse Attention (DSA) architecture and efficient token‑selection. The open‑source base model (93.1 % AIME accuracy) and the higher‑performing V3.2‑Speciale variant (gold‑medal scores on the 2025 IMO and IOI) show that advanced AI no longer requires massive compute budgets. Enterprise users can deploy the models on‑premise, benefiting from lower cost, strong coding performance, and retained reasoning traces, though DeepSeek plans to improve factual coverage and generation fluency.

    2026年1月18日
  • DeepSeek-V3.2-Exp: What’s New in DeepSeek’s Latest Model

    DeepSeek’s experimental V3.2-Exp model introduces “DeepSeek Sparse Attention” (DSA) to enhance efficiency, reduce costs, and handle longer documents. DSA filters less relevant data, potentially halving operational costs. While promising faster, cheaper AI deployment, concerns exist about potential loss of critical data nuances and impact on model reliability. Designed for Chinese AI chips, DeepSeek’s open-source approach encourages collaboration but faces potential patent challenges. The focus on efficiency positions DeepSeek competitively in the evolving AI landscape.

    2025年9月30日