Editor’s Pick

You should not miss this one

[Comparison of OCR Models] Comparing the Top 6 OCR (Optical Character Recognition) Models/Systems in 2025. Optical character recognition has moved from plain text extraction to document intelligence. Modern systems must read scanned and digital PDFs in one pass, preserve layout, detect tables, extract key value pairs, and work with more than one language. Many teams now also want OCR that can feed RAG and agent pipelines directly. The goal of this comparison is not to rank them on a single metric, because they target different constraints. The goal is to show which system to use for a given document volume, deployment model, language set, and downstream AI stack.....

AI Dev and Latest Releases

[SLM] Google AI Unveils Supervised Reinforcement Learning (SRL): A Step Wise Framework with Expert Trajectories to Teach Small Language Models to Reason through Hard Problems. How can a small model learn to solve tasks it currently fails at, without rote imitation or relying on a correct rollout? A team of researchers from Google Cloud AI Research and UCLA have released a training framework, 'Supervised Reinforcement Learning' (SRL), that makes 7B scale models actually learn from very hard math and agent trajectories that normal supervised fine tuning and outcome based reinforcement learning RL cannot learn from.

[Agent Research] DeepAgent: A Deep Reasoning AI Agent that Performs Autonomous Thinking, Tool Discovery, and Action Execution within a Single Reasoning Process. DeepAgent is an end to end deep reasoning agent that lets a 32B class model think, discover tools from large registries such as 16,000 plus RapidAPI APIs and about 3,900 ToolHop tools, call those tools with structured arguments, and then fold the whole trajectory into episodic, working, and tool memories, so long horizon tasks do not break the context. It is trained with Tool Policy Optimization, ToolPO, which uses LLM simulated APIs and token level credit, and it reports the strongest 32B results across both labeled tool and open set tool settings on ToolBench, API Bank, TMDB, Spotify, ToolHop, and on downstream tasks such as ALFWorld, WebShop, GAIA, and HLE.

[Multimodal Open Source] LongCat-Flash-Omni: A SOTA Open-Source Omni-Modal Model with 560B Parameters with 27B activated, Excelling at Real-Time Audio-Visual Interaction. How do you design a single model that can listen, see, read and respond in real time across text, image, video and audio without losing the efficiency? Meituan’s LongCat team has released LongCat Flash Omni, an open source omni modal model with 560 billion parameters and about 27 billion active per token, built on the shortcut connected Mixture of Experts design that LongCat Flash introduced. The model extends the text backbone to vision, video and audio, and it keeps a 128K context so it can run long conversations and document level understanding in one stack.

Project Notebooks/Tutorials

▶ How to Design a Persistent Memory and Personalized Agentic AI System with Decay and Self-Evaluation? Codes Tutorial

▶ How to Design an Autonomous Multi-Agent Data and Infrastructure Strategy System Using Lightweight Qwen Models for Efficient Pipeline Intelligence? Codes Tutorial

▶ How to Build a Fully Functional Computer-Use Agent that Thinks, Plans, and Executes Virtual Actions Using Local AI Models Codes Tutorial

▶ A Coding Implementation of a Comprehensive Enterprise AI Benchmarking Framework to Evaluate Rule-Based LLM, and Hybrid Agentic AI Systems Across Real-World Tasks Codes Tutorial

▶ How to Build Ethically Aligned Autonomous Agents through Value-Guided Reasoning and Self-Correcting Decision-Making Using Open-Source Models Codes Tutorial

▶ How to Build an End-to-End Data Engineering and Machine Learning Pipeline with Apache Spark and PySpark Codes Tutorial

▶ A Coding Implementation to Build a Transformer-Based Regression Language Model to Predict Continuous Values from Text Codes Tutorial

How was today’s email?

Awesome  |   Decent    |  Not Great

Keep Reading

No posts found