Self-distillation lets LLMs acquire new skills without erasing old ones
Read Our Expert Analysis
Create an account or login for free to unlock our expert analysis and key takeaways for this development.
By continuing, you agree to receive marketing communications and our weekly newsletter. You can opt-out at any time.
Recommended for you

Nvidia’s Dynamic Memory Sparsification slashes LLM reasoning memory costs by up to 8x
Nvidia researchers introduced Dynamic Memory Sparsification (DMS), a retrofit that compresses the KV cache so large language models can reason farther with far less GPU memory. In benchmarks DMS reduced cache footprint by as much as eightfold, raised throughput up to five times for some models, and improved task accuracy under fixed memory budgets.
University of Maryland team embeds 3x LLM inference speed into model weights
Researchers from University of Maryland, Lawrence Livermore, Columbia and TogetherAI demonstrate a weight-level multi-token prediction adaptation that yields ~3x inference throughput with modest accuracy trade-offs. The technique uses a single special embedding token plus a ConfAdapt confidence gate to accelerate predictable segments while preserving quality on hard tokens.
Internal debates inside advanced LLMs unlock stronger reasoning and auditability
A Google-led study finds that high-performing reasoning models develop internal, multi-perspective debates that materially improve complex planning and problem-solving. The research implies practical shifts for model training, prompt design, and enterprise auditing—favoring conversational, messy training data and transparency over sanitized monologues.
Microsoft research shows a single fine-tuning example can erode safety across major LLMs
Microsoft researchers demonstrate that a single, innocuous-seeming training example can substantially weaken safety behavior across a range of language and image models, raising urgent enterprise governance questions. The technique exploits a common optimization approach to reinforce harmful completions while preserving model utility, producing large increases in permissive outputs on standard safety benchmarks.

OpenAI pushes agents from ephemeral assistants to persistent workers with memory, shells, and Skills
OpenAI’s Responses API now adds server-side state compaction, hosted shell containers, and a Skills packaging standard to support long-running, reproducible agent workflows. Early partner reports and ecosystem moves (including large-context advances from rivals) show the feature set accelerates production adoption while concentrating responsibility for governance, secrets, and runtime controls.
Nvidia Nemotron-Cascade 2: Post‑Training Playbook Upsets Size Orthodoxy
Nvidia’s Nemotron-Cascade 2 uses a sequential post-training recipe to deliver top-tier math and coding performance while activating only 3B parameters at inference. The Cascade RL pipeline plus MOPD token-level distillation signals a shift toward intelligence-density strategies that cut serving cost and raise the value of training orchestration. Public materials across the Nemotron family sometimes report divergent headline sizes, a difference that likely reflects measurement conventions rather than an architectural contradiction.
OpenAI accelerates theoretical-physics calculations with model collaboration
OpenAI -backed models helped researchers solve complex gluon calculations, producing two preprints in early 2026 and compressing timelines from months to weeks. Company-published usage statistics and cross‑vendor demonstrations suggest this episode is part of a broader move toward agentic, model-in-the-loop scientific workflows — but widespread adoption depends on urgent investment in provenance, formal verification and new institutional practices.

Waymo’s new simulation engine aims to accelerate robotaxi scaling
Waymo has published technical details of a large-scale simulation system—built atop Google DeepMind’s Genie 3 and tailored to the driving domain—to generate multi-sensor virtual environments and rare-event scenarios. The capability, combined with recent funding and city expansions, is positioned to speed validation and deployment of its robotaxi fleet while concentrating scrutiny on simulation fidelity and regulatory oversight.