Inside: NVIDIA ships RL training for any coding agent → Together AI cuts KV cache 8x → EAGLE 3.1 kills attention drift → Stability goes open on audio....

In partnership with

👋 Hello. You’re reading the AI Dev Brief by MarkTechPost — the daily signal for AI engineers and researchers who build with AI, not just talk about it. No hype. No filler. Just the research, releases, and infrastructure moves that actually matter.

Want to promote your AI Product, GitHub repo, HuggingFace model, product release, or webinar in front of 1,000,000+ AI practitioners? Connect with us

🔥 TODAY’S BRIEFING — STORIES WORTH 5 MINUTES

1. NVIDIA Releases Polar: Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code — Polar, a framework that proxies LLM API calls to reconstruct token-faithful RL trajectories for training agents on unmodified coding harnesses. Most RL systems require white-box model access — Polar removes that constraint entirely. Using simple GRPO, Polar improves Qwen3.5-4B by 22.6 points on SWE-Bench Verified with the Codex harness. Works with Claude Code and Qwen Code too. Open-sourced. You can now do agentic RL training on any closed or open model, at scale.

2. Together AI Open-Sources OSCAR: 2-Bit KV Cache Quantization — 8x Memory Reduction, Near-BF16 Accuracy — OSCAR, an attention-aware 2-bit KV cache quantization system for long-context LLM serving. Every prior INT2 approach collapsed past 32K context — OSCAR doesn't. It reduces KV cache memory by ~8x, boosts serving throughput by up to 7x at large batch sizes, and maintains near-BF16 output accuracy — even at 128K context with Qwen3-8B.

3. EAGLE 3.1: The Speculative Decoding Fix That Eliminates Attention Drift in Deep-Step LLM Inference — The EAGLE team and vLLM have released EAGLE 3.1, fixing attention drift — the key bottleneck behind acceptance-length degradation as speculation depth increases. Deeper speculative steps caused the drafter to gradually shift attention in ways that hurt accuracy. EAGLE 3.1 patches this with FC normalization and post-norm hidden-state feedback, restoring full acceptance rate at deeper depths.

4. Sakana AI Proposes DiffusionBlocks: Train Transformer Networks One Block at a Time — No Full Network in Memory — DiffusionBlocks, a block-wise training framework that converts transformer-based residual networks into independently trainable denoising modules. Standard diffusion training requires the full network in memory for every denoising step — DiffusionBlocks holds only one block in memory at a time, reducing memory requirements proportionally to the number of blocks.

Fast browsing. Faster thinking.

Your browser gets you to a page. Norton Neo gets you to the answer. The first safe AI-native browser built by Norton moves with you from idea to action without slowing you down. Magic Box understands your intent before you finish typing. AI that works inside your flow, not beside it. No prompting. No copy-pasting. No switching apps.

Built-in AI, instantly and for free. Privacy handled by Norton. Built-in VPN and ad blocking protect you by default. No configuration. No extra apps. Nothing to think about.

Fast. Safe. Intelligent. That's Neo.

Download Norton Neo

📰 Secondary News

MEMO: Encode New Knowledge Into a Dedicated Memory Model — LLM Parameters Untouched — MEMO (Memory as a Model) is a modular framework that encodes new knowledge into a dedicated separate memory model using a five-step reflection QA pipeline — the base LLM is never modified. New knowledge slots in as a module. No fine-tuning, no RAG pipeline. Plug in new information, pull it back out via the memory model. A clean architectural solution to LLM knowledge staleness.

OmniVoice Studio: Local Open-Source Alternative to ElevenLabs — Voice Cloning, Dubbing, Diarization — OmniVoice Studio is an open-source desktop application for voice cloning, video dubbing, real-time dictation, and speaker diarization — running fully local, no cloud dependency. ElevenLabs charges per character. OmniVoice runs on your machine for free. Available on GitHub now. The open audio stack keeps filling in.

Stability AI Releases Stable Audio 3: Open-Weight Family for Music and Audio Generation — Up to 6m 20s of Stereo Output — Stable Audio 3, a family of fast latent diffusion models (small, medium, large) for variable-length audio generation and editing. Generates up to 6 minutes 20 seconds of stereo audio from text prompts — music, sound effects, and edits.

🛠️ More Releases/Updates for AI Devs

A. Microsoft: Shipped computer-using agents in Copilot Studio to general availability across all commercial Power Platform geographies — the first major cloud platform to reach production-grade computer use. Agents can now operate any web or desktop app a human can use, without custom APIs, running on Windows 365 Cloud PCs with Azure Key Vault credential storage and Microsoft Purview audit logging. The May update also ships a redesigned workflow canvas, native voice capabilities, and Work IQ extensibility for agent-to-agent collaboration.

B. Google: Launched Managed Agents in the Gemini API in public preview — a single API call that spins up a stateful, sandboxed agent capable of reasoning, tool use, and code execution in an isolated Linux environment. Also released the general-purpose Antigravity Agent (antigravity-preview-05-2026) in public preview — it can plan, write code, manage files, and browse the web inside its container. Available via the Interactions API and AI Studio.

C. Anthropic: Rolled out a fresh batch of Claude Code developer platform updates this week: ANTHROPIC_WORKSPACE_ID env variable for workload identity federation, claude agents --cwd <path> for scoping session lists to a directory, a "Summarize up to here" rewind option to compress earlier context, and background agents that now preserve their permission mode on launch. Cache diagnostics also launched in public beta to explain prompt cache misses in real time.

D. NVIDIA: Released GLM-5.1-NVFP4 on HuggingFace — a quantized deployment build of Z.ai's 754B MoE open-source model optimized for NVIDIA tensor cores. Supports 200K context, 8x H200 or A100 serving via vLLM and SGLang, and is the highest-ranked open-source model on Code Arena as of this week. MIT licensed, commercially deployable, no restrictions.

E. Anthropic: Shipped a batch of Claude Code developer platform updates this week, including: ANTHROPIC_WORKSPACE_ID for workload identity federation, claude agents --cwd <path> to scope session lists to a directory, a "Summarize up to here" rewind option to compress earlier context, and background agents that now preserve the current permission mode on launch. Cache diagnostics launched in public beta to explain prompt cache misses.

F. GitHub: Announced usage-based billing via GitHub AI Credits goes live June 1 — five days away. All Copilot plans switch from premium requests to token-based metering where 1 AI Credit = $0.01 USD. Code completions remain free. Frontier model agent sessions will cost more; devs on monthly plans migrate automatically with no action needed. Annual plans grandfather until renewal but model multipliers increase June 1.

❝

[Partner with us] Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

Inside: NVIDIA ships RL training for any coding agent → Together AI cuts KV cache 8x → EAGLE 3.1 kills attention drift → Stability goes open on audio....

🔥 TODAY’S BRIEFING — STORIES WORTH 5 MINUTES

Fast browsing. Faster thinking.

📰 Secondary News

🛠️ More Releases/Updates for AI Devs

How was today’s email?

Awesome | Decent | Not Great

Keep Reading

The newsletter platform built for AI Devs