Machine Learning

Frontier advances spanning LLMs, diffusion models, graph learning, and efficient architectures.

DeepSeek V3 / R1: Mixture-of-Experts and Multi-head Latent Attention

DeepSeek · 2025–2026

Demonstrated that 671B parameter MoE models with only 37B activated per inference can match dense models. Multi-head Latent Attention (MLA) dramatically reduces KV cache, enabling cost-efficient serving at scale. Sparked the "DeepSeek moment" — proving open-weight models can compete with closed-source frontier systems.

DiffusionGemma: Breaking Free of Left-to-Right Processing

Google DeepMind · June 2026

A 26B MoE model generating 256-token blocks in parallel via iterative denoising — up to 4x faster inference (1,000 tokens/sec on H100). Bidirectional attention excels at infilling, reasoning, and non-linear generation. Released Apache 2.0. Signals a paradigm shift: diffusion may rival autoregressive decoding for text.

SiST-GNN: Simultaneous Spatial-Temporal Message Passing

arXiv:2605.25548 · May 2026

Fuses topology and temporal evolution into a single message-passing operation for dynamic graphs. Sets new SOTA on link prediction, outperforming prior methods by 109–277%. Demonstrates that simultaneous ST reasoning beats sequential chaining.