Here is your today’s AI Dev Brief from Marktechpost, covering core research, models, infrastructure tools, and applied updates for AI developers and researchers.
Zhipu AI Introduces GLM-OCR: A 0.9B Multimodal OCR Model for Document Parsing and Key Information Extraction (KIE)
GLM-OCR is a 0.9B multimodal OCR model from Zhipu AI designed for real-world document understanding, combining a 0.4B CogViT visual encoder with a 0.5B GLM decoder and using Multi-Token Prediction (MTP) to improve decoding efficiency by generating multiple tokens per step. The system uses a two-stage pipeline with PP-DocLayout-V3 for layout analysis followed by parallel region-level recognition, and supports both document parsing and Key Information Extraction (KIE) through structured generation in Markdown and JSON. The research reports strong results on benchmarks such as OmniDocBench v1.5, OCRBench (Text), and UniMERNet, while also showing competitive in-house results on code documents, tables, multilingual text, and receipt KIE, positioning GLM-OCR as a compact OCR system aimed at balancing accuracy, throughput, and deployability rather than serving as a general-purpose vision-language model..… Read the full analysis/article here.

Meet OpenViking: An Open-Source Context Database that Brings Filesystem-Based Memory and Retrieval to AI Agent Systems like OpenClaw
Volcengine’s OpenViking is an open-source Context Database for AI Agents that manages memory, resources, and skills through a filesystem-style hierarchy instead of a flat RAG pipeline. It combines Directory Recursive Retrieval, L0/L1/L2 tiered context loading, retrieval trajectory visibility, and automatic session-based memory iteration to make agent context more structured, debuggable, and token-efficient. For developers building long-horizon agents, OpenViking is notable because it treats context organization as a core systems layer rather than a prompt-side patch..… Read the full analysis/article here.
IBM AI Releases Granite 4.0 1B Speech as a Compact Multilingual Speech Model for Edge AI and Translation Pipelines
IBM’s Granite 4.0 1B Speech is a compact speech-language model built for multilingual ASR and bidirectional AST, with support for English, French, German, Spanish, Portuguese, and Japanese. It has half the parameters of granite-speech-3.3-2b, adds Japanese ASR and keyword list biasing, and improves inference efficiency through better encoder training and speculative decoding. The model is released under Apache 2.0, supports deployment through Transformers and vLLM, and fits IBM’s broader push toward smaller, more deployable speech systems for enterprise and resource-constrained environments rather than oversized voice models optimized mainly for demos… Read the full analysis/article here.
Latest Releases in Last 72 Hours
DeepAgents (Langchain)
Trellis-KimiK2T (Workshop Labs)
Gemini CLI 0.33.0 (Google)
OB-1 (OpenBlock Labs)
DuClaw (Baidu)
Lightpanda Browser (Lightpanda)
A2A Protocol v1.0 (A2A)
Project Notebooks/Tutorials
▶ A Coding Implementation to Design an Enterprise AI Governance System Using OpenClaw Gateway Policy Engines, Approval Workflows and Auditable Agent Execution Codes Tutorial
▶ How to Build an Autonomous Machine Learning Research Loop in Google Colab Using Andrej Karpathy’s AutoResearch Framework for Hyperparameter Discovery and Experiment Tracking Codes Tutorial
▶ How to Design a Streaming Decision Agent with Partial Reasoning, Online Replanning, and Reactive Mid-Execution Adaptation in Dynamic Environments Codes Tutorial
▶ How to Build a Self-Designing Meta-Agent That Automatically Constructs, Instantiates, and Refines Task-Specific AI Agents Codes Tutorial
▶ How to Build a Risk-Aware AI Agent with Internal Critic, Self-Consistency Reasoning, and Uncertainty Estimation for Reliable Decision-Making Codes Tutorial
Upcoming AI Events
NVIDIA GTC 2026 [March 2026]
ICLR 2026 [April 2026]
AI Now Summit- Mistral AI [May 2026]
CVPR 2026 [June 2026]
ACL 2026 [July 2026]
ICML 2026 [July 2026]
EMNLP 2026 [Oct 2026]
and many…..