标题:
vLLM发布v0.20版本支持DeepSeek V4 MegaMoE并提升4倍KV缓存容量
摘要:
vLLM团队发布v0.20.0版本,重点优化内存效率与MoE模型推理性能。新版本集成TurboQuant 2-bit KV缓存技术,实现KV缓存容量提升4倍,并重新启用FA4以支持MLA预填充在SM90+架构上的运行。
该版本引入新的vLLM IR基础架构,融合RMSNorm操作,据称可降低2.1%端到端延迟。同时扩展对DeepSeek V4 MegaMoE在Blackwell、Jetson Thor、ROCm及Intel XPU等平台的支持,并简化GB200/Grace-Blackwell部署流程。
行业分析指出,B300芯片在DeepSeek V4推理任务中性能可达H200的8倍。vLLM正推进与DeepGEMM MegaMoE的基准测试,后者将EP调度、GEMM计算与SwiGLU融合为单一内核,有望进一步降低通信开销。
vLLM v0.20支持2-bit KV缓存提升容量
DeepSeek V4推理性能B300达H200八倍
新IR架构融合RMSNorm降低延迟
DeepGEMM MegaMoE整合多算子优化
Title:
Nvidia, Poolside, OpenAI Release New Models Amid GPT-6 Speculation
Summary:
Nvidia, Poolside, and OpenAI researcher Alec Radford released new AI models during the April 27–28, 2026 period, though long-term impact remains uncertain. Concurrently, speculation around OpenAI’s GPT-6 began circulating across AI communities. The updates were part of a broader wave of model activity, despite limited major announcements during the two-day window.
vLLM v0.20.0 introduced significant inference optimizations, including TurboQuant 2-bit KV cache for 4× capacity expansion, FA4 reactivation for MLA prefill on SM90+ GPUs, and a new intermediate representation (IR) foundation. The release also added support for DeepSeek V4 MegaMoE on Blackwell, Jetson Thor, ROCm, and Intel XPU platforms, alongside simplified GB200/Grace-Blackwell deployment. SemiAnalysis reported B300 GPUs delivering up to 8× faster performance than H200 for DeepSeek V4 workloads in disaggregated setups.
The developments reflect growing focus on inference efficiency and hardware-software co-design. DeepSeek’s use of TileKernel signals a strategic shift away from CUDA dependency, promoting stack portability. These trends underscore the industry’s move toward specialized kernels and heterogeneous hardware support.
Key Takeaways:
vLLM v0.20.0 boosts KV cache capacity 4× with 2-bit TurboQuant
DeepSeek V4 leverages TileKernel to reduce CUDA lock-in
B300 GPUs show up to 8× speed gains over H200 in early tests
New model releases from Nvidia, Poolside, and Alec Radford emerge
Source: Original Article
查看原文 →
View Original →