Here is your today’s AI Dev Brief from Marktechpost, covering core research, models, infrastructure tools, and applied updates for AI developers and researchers.

NVIDIA AI Releases Orchestrator-8B: A Reinforcement Learning Trained Controller for Efficient Tool and Model Selection

Orchestrator 8B is an 8B parameter controller that learns to route across tools and LLMs instead of solving everything with one frontier model. It formulates multi step tool use as a Markov Decision Process, optimizes a multi objective reward that mixes task success, monetary cost, latency and user preferences, and uses ToolScale synthetic tasks for large scale training. On Humanity’s Last Exam, FRAMES and τ² Bench, Orchestrator 8B outperforms GPT 5 tool baselines while running at about 30 percent of their cost and with around 2.5 times lower latency, mainly because it distributes calls across specialist models, web search, retrieval and code execution in a more cost aware way. Read the full launch insights/article here.

MiniMax-M2 is an agent and code native LLM designed for end to end developer workflows, with strong performance across tools like Claude Code, Cursor, Cline and Kilo Code. It uses an interleaved thinking training approach and a 230B parameter architecture with 10B activated per step to deliver low latency, high throughput, and robust long horizon tool use across mcp, shell, browser, retrieval and code. Pricing starts at $0.3 / 1M input tokens and $1.2 / 1M output tokens, with coding plans from $10 to $50 per month. Check out the deal.

DeepSeek AI Releases DeepSeekMath-V2: The Open Weights Maths Model That Scored 118/120 on Putnam 2024

DeepSeekMath V2 is a 685B parameter open weights maths model built on DeepSeek V3.2 Exp Base, trained for self verifiable natural language theorem proving rather than just final answer accuracy. Using a verifier, meta verifier and a proof generator with sequential refinement and scaled test time compute, it achieves gold level performance on IMO 2025 and CMO 2024 and scores 118 of 120 on Putnam 2024, showing that open models can now match elite human and proprietary systems on top tier math competitions. Read the full launch insights/article here.

StepFun AI Releases Step-Audio-R1: A New Audio LLM that Finally Benefits from Test Time Compute Scaling

StepFun’s Step-Audio-R1 is an open audio reasoning LLM built on Qwen2 audio and Qwen2.5 32B that uses Modality Grounded Reasoning Distillation and Reinforcement Learning with Verified Rewards to turn long chain of thought from a liability into an accuracy gain, surpassing Gemini 2.5 Pro and approaching Gemini 3 Pro on comprehensive audio benchmarks across speech, environmental sound and music while providing a reproducible training recipe and vLLM based deployment for real world audio applications. Read the full launch insights/article here.

Project Notebooks/Tutorials

▶ [Open Source] Rogue: An Open-Source AI Agent Evaluator worth trying Codes & Examples

▶ A Coding Guide to Design an Agentic AI System Using a Control-Plane Architecture for Safe, Modular, and Scalable Tool-Driven Reasoning Workflows Codes Tutorial

▶ A Coding Implementation for an Agentic AI Framework that Performs Literature Analysis, Hypothesis Generation, Experimental Planning, Simulation, and Scientific Reporting Codes Tutorial

How was today’s email?

Awesome  |   Decent    |  Not Great

Keep Reading

No posts found