Meta's Perception Encoder Audiovisual (PE-AV) | Google Introduces A2UI (Agent-to-User Interface)

Here is your today’s AI Dev Brief from Marktechpost, covering core research, models, infrastructure tools, and applied updates for AI developers and researchers.

Meta AI Open-Sourced Perception Encoder Audiovisual (PE-AV): The Audiovisual Encoder Powering SAM Audio And Large Scale Multimodal Retrieval

Perception Encoder Audiovisual, PE AV, is Meta’s new open source backbone for joint audio, video, and text understanding, trained with contrastive learning on around 100M audio video pairs and released as 6 checkpoints that embed audio, video, audio video, and text into a single space for cross modal retrieval and classification, while a related PE A Frame variant provides frame level audio text embeddings for precise sound event localization and together they now power the perception layer inside Meta’s SAM Audio system and the broader Perception Models stack. Read the full insights/article here.

Check out the paper here

CopilotKit v1.50 Brings AG-UI Agents Directly Into Your App With the New useAgent Hook

Agent frameworks are now good at reasoning and tools, but most teams still write custom code to turn agent graphs into robust user interfaces with shared state, streaming output and interrupts. CopilotKit targets this last mile. It is an open source framework for building AI copilots and in-app agents directly in your app, with real time context and UI control. The release of of CopilotKit’s v1.50 rebuilds the project on the Agent User Interaction Protocol (AG-UI) natively. The key idea is simple; Let AG-UI define all traffic between agents and UIs as a typed event stream to any app through a single hook, useAgent.....

Check out the GitHub Repo

Google Introduces A2UI (Agent-to-User Interface): An Open Sourc Protocol for Agent Driven Interfaces

Google’s A2UI is an open project that lets agents send declarative JSON descriptions of user interfaces while clients render them with native components such as Angular, Flutter or web components, using a security first “data not code” model and a flat, LLM friendly representation that already works over A2A and AG UI transports with reference implementations and samples available in the google/A2UI repo. Read the full insights/article here.

Check out the repo here

Google DeepMind Researchers Release Gemma Scope 2 as a Full Stack Interpretability Suite for Gemma 3 Models

Gemma Scope 2 is an open interpretability suite from Google DeepMind that instruments every Gemma 3 model, from 270M to 27B parameters, with sparse autoencoders and transcoders trained on every layer, giving researchers a practical microscope on internal activations to study jailbreaks, hallucinations, sycophancy and other complex behaviors. Read the full insights/article here.

Check out the technical details

Project Notebooks/Tutorials

▶ [Open Source] Rogue: An Open-Source AI Agent Evaluator worth trying Codes & Examples

▶ How to Build a Fully Autonomous Local Fleet-Maintenance Analysis Agent Using SmolAgents and Qwen Model Codes Tutorial

▶ A Coding Guide to Design a Complete Agentic Workflow in Gemini for Automated Medical Evidence Gathering and Prior Authorization Submission Codes Tutorial

▶ How to Orchestrate a Fully Autonomous Multi-Agent Research and Writing Pipeline Using CrewAI and Gemini for Real-Time Intelligent Collaboration Codes Tutorial

100+ more open codes/notebooks here ➡️

How was today’s email?

Awesome | Decent | Not Great

For Sponsorship/Promotion, Please reach out at [email protected]