NVIDIA AI Releases Orchestrator-8B | DeepSeek AI Releases DeepSeekMath-V2

Here is your today’s AI Dev Brief from Marktechpost, covering core research, models, infrastructure tools, and applied updates for AI developers and researchers.

NVIDIA AI Releases Orchestrator-8B: A Reinforcement Learning Trained Controller for Efficient Tool and Model Selection

Orchestrator 8B is an 8B parameter controller that learns to route across tools and LLMs instead of solving everything with one frontier model. It formulates multi step tool use as a Markov Decision Process, optimizes a multi objective reward that mixes task success, monetary cost, latency and user preferences, and uses ToolScale synthetic tasks for large scale training. On Humanity’s Last Exam, FRAMES and τ² Bench, Orchestrator 8B outperforms GPT 5 tool baselines while running at about 30 percent of their cost and with around 2.5 times lower latency, mainly because it distributes calls across specialist models, web search, retrieval and code execution in a more cost aware way. Read the full launch insights/article here.

Check out the Full Paper here

[Time Sensitive] $2 Super Discounted Deal from miniMAX AI Coding- Agent & Code Native, at 8% Claude Sonnet price, ~2x faster_(Sponsored)

MiniMax-M2 is an agent and code native LLM designed for end to end developer workflows, with strong performance across tools like Claude Code, Cursor, Cline and Kilo Code. It uses an interleaved thinking training approach and a 230B parameter architecture with 10B activated per step to deliver low latency, high throughput, and robust long horizon tool use across mcp, shell, browser, retrieval and code. Pricing starts at $0.3 / 1M input tokens and $1.2 / 1M output tokens, with coding plans from $10 to $50 per month. Check out the deal.

Redeem the $2 deal

DeepSeek AI Releases DeepSeekMath-V2: The Open Weights Maths Model That Scored 118/120 on Putnam 2024

DeepSeekMath V2 is a 685B parameter open weights maths model built on DeepSeek V3.2 Exp Base, trained for self verifiable natural language theorem proving rather than just final answer accuracy. Using a verifier, meta verifier and a proof generator with sequential refinement and scaled test time compute, it achieves gold level performance on IMO 2025 and CMO 2024 and scores 118 of 120 on Putnam 2024, showing that open models can now match elite human and proprietary systems on top tier math competitions. Read the full launch insights/article here.

Check out the Model Weights

StepFun AI Releases Step-Audio-R1: A New Audio LLM that Finally Benefits from Test Time Compute Scaling

StepFun’s Step-Audio-R1 is an open audio reasoning LLM built on Qwen2 audio and Qwen2.5 32B that uses Modality Grounded Reasoning Distillation and Reinforcement Learning with Verified Rewards to turn long chain of thought from a liability into an accuracy gain, surpassing Gemini 2.5 Pro and approaching Gemini 3 Pro on comprehensive audio benchmarks across speech, environmental sound and music while providing a reproducible training recipe and vLLM based deployment for real world audio applications. Read the full launch insights/article here.

Check out the Paper

Project Notebooks/Tutorials

▶ [Open Source] Rogue: An Open-Source AI Agent Evaluator worth trying Codes & Examples

▶ A Coding Guide to Design an Agentic AI System Using a Control-Plane Architecture for Safe, Modular, and Scalable Tool-Driven Reasoning Workflows Codes Tutorial

▶ A Coding Implementation for an Agentic AI Framework that Performs Literature Analysis, Hypothesis Generation, Experimental Planning, Simulation, and Scientific Reporting Codes Tutorial

100+ more open codes/notebooks here ➡️

NVIDIA AI Releases Orchestrator-8B | DeepSeek AI Releases DeepSeekMath-V2

NVIDIA AI Releases Orchestrator-8B: A Reinforcement Learning Trained Controller for Efficient Tool and Model Selection

[Time Sensitive] $2 Super Discounted Deal from miniMAX AI Coding- Agent & Code Native, at 8% Claude Sonnet price, ~2x faster_(Sponsored)

DeepSeek AI Releases DeepSeekMath-V2: The Open Weights Maths Model That Scored 118/120 on Putnam 2024

StepFun AI Releases Step-Audio-R1: A New Audio LLM that Finally Benefits from Test Time Compute Scaling

Project Notebooks/Tutorials

How was today’s email?

Awesome | Decent | Not Great

Keep Reading

The newsletter platform built for AI Devs

NVIDIA AI Releases Orchestrator-8B | DeepSeek AI Releases DeepSeekMath-V2

NVIDIA AI Releases Orchestrator-8B: A Reinforcement Learning Trained Controller for Efficient Tool and Model Selection

[Time Sensitive] $2 Super Discounted Deal from miniMAX AI Coding- Agent & Code Native, at 8% Claude Sonnet price, ~2x faster (Sponsored)

DeepSeek AI Releases DeepSeekMath-V2: The Open Weights Maths Model That Scored 118/120 on Putnam 2024

StepFun AI Releases Step-Audio-R1: A New Audio LLM that Finally Benefits from Test Time Compute Scaling

Project Notebooks/Tutorials

How was today’s email?

Awesome | Decent | Not Great

Keep Reading

The newsletter platform built for AI Devs

[Time Sensitive] $2 Super Discounted Deal from miniMAX AI Coding- Agent & Code Native, at 8% Claude Sonnet price, ~2x faster_(Sponsored)