📡 AI INTELLIGENCE BRIEF: The Week AI Memory Got a Brain and Agents Got a Directory

In partnership with

📈 TREND WATCH

The agent infrastructure layer is being built in real-time. Memory, isolation, orchestration, and observability — the four pillars of production-grade agents — all shipped new primitives this week.

🔴 LEAD STORY

Perplexity Launches Brain — A Self-Improving Memory System That Gets Smarter Every Night Perplexity · Jun 18 · Agent Memory

What it is: Brain is a continuously learning memory layer inside Computer. Every task the agent completes plugs into a context graph. Overnight, Brain reviews that graph, synthesizes learnings, and builds an LLM wiki — so tomorrow's agent runs start smarter than yesterday's.

Capability	Detail
Memory type	Context graph — work history, not user history
Update cycle	Overnight synthesis
Output	Auto-generated LLM wiki for future runs
Pricing	$200/mo (Computer plan)

Why it matters: Every other agent memory system remembers you. Brain remembers your work — what the agent did, which sources it used, what patterns produced good results. The agent compounds. This is the first production memory system that improves agent performance on future tasks, not just recall.

Decision signal: If you're running Computer at scale, this changes the ROI math. Agent quality improves without any additional prompting investment. The longer you run it, the better it gets.

📊 DEEP DIVES

#1 Vercel: Eve — Open-Source Agent Framework Where an Agent Is Just a Directory Vercel · Jun 17 · Agent Framework

The core idea: An agent isn't a class, a graph, or a config file. It's a directory of files. Eve compiles the directory, wires up durable workflows, and connects channels automatically.

What ships: Durable execution built-in · production-ready from day one · npm package eve · Vercel Connect for channel auth (Slack out of the box)

So what: Every other framework requires you to think in graphs, chains, or orchestration patterns. Eve makes an agent a filesystem artifact — versionable, diffable, deployable in one command. The lowest cognitive overhead agent framework that ships to production. Open source.

#2 MiniMax Sparse Attention: 28.4× Less Compute at 1M Context, Trained on 109B MoE MiniMax · Jun 17 · Architecture Research

Metric	Result
Per-token attention compute (1M ctx)	28.4× lower vs. dense GQA
Prefill speed (H800)	14.2× faster
Decode speed (H800)	7.6× faster
Quality vs. GQA	On par
Training budget	3T tokens · 109B MoE

Mechanism: Two-branch block-sparse attention — a local branch for recent context + a global branch for long-range dependencies. GQA-native, so it drops into existing architectures without redesign.

So what: Long-context inference is currently the biggest cost driver in production LLM serving. A 28.4× compute reduction at 1M context with zero quality loss is an infrastructure-level result. This will be in every long-context serving stack within 12 months.

Turn AI into Your Income Engine

Ready to transform artificial intelligence from a buzzword into your personal revenue generator?

HubSpot’s groundbreaking guide "200+ AI-Powered Income Ideas" is your gateway to financial innovation in the digital age.

Inside you'll discover:

A curated collection of 200+ profitable opportunities spanning content creation, e-commerce, gaming, and emerging digital markets—each vetted for real-world potential
Step-by-step implementation guides designed for beginners, making AI accessible regardless of your technical background
Cutting-edge strategies aligned with current market trends, ensuring your ventures stay ahead of the curve

Download your guide today and unlock a future where artificial intelligence powers your success. Your next income stream is waiting.

Get Your Guide

#3 Hermes Agent: Async Subagents — Delegated Work No Longer Blocks the Parent Chat Nous Research · Jun 16 · Open Agent

What changed: The delegate tool now spawns subagents asynchronously. Parent chat stays fully responsive while background agents run research, refactors, builds, or analyses in parallel.

Pattern unlocked:

So what: Synchronous delegation was the biggest bottleneck in multi-agent Hermes workflows. Async subagents turn Hermes into a true parallel workstreams machine. Open source. Available now.

#4 OpenAI: Deployment Simulation — Predicts Model Behavior Before Release by Replaying Real Conversations OpenAI · Jun 16 · Safety / Research

How it works: Takes real production conversations from the previous model → replays them through the candidate new model → flags behavioral drift before the new model ships.

The finding: Deployment Simulation catches undesired behavior that standard pre-deployment evals miss entirely — because synthetic evals don't capture the long tail of real user interaction patterns.

So what: This is the most credible pre-deployment safety methodology published to date. It's also a direct response to the Fable 5 situation — where a model shipped, surprised the government, and got pulled within 36 hours. For any team building model evaluation pipelines, this paper is required reading.

#5 Liquid AI: LFM2.5-Embedding-350M + LFM2.5-ColBERT-350M — Fast Multilingual Search Across 11 Languages Liquid AI · Jun 19 · Retrieval / Embeddings

Model	Type	Params	Training
LFM2.5-Embedding-350M	Dense bi-encoder	350M	28T tokens
LFM2.5-ColBERT-350M	Late interaction	350M	28T tokens

What's interesting under the hood: Both models were built by patching LFM2.5-350M — a causal decoder — into a bidirectional encoder. That's an unusual architectural move: taking a generative model pretrained on 28T tokens and converting it into a retrieval model, rather than training an encoder from scratch. Result: retrieval quality that beats larger dedicated embedding models.

Coverage: 11 languages · drop-in replacement for existing RAG pipelines · available on Hugging Face now

So what: Two distinct retrieval patterns in one release — dense bi-encoder for speed, ColBERT late-interaction for precision. Both at 350M params, both multilingual, both cheaper to run than the models they beat. For any team running multilingual RAG today, this is a direct upgrade worth benchmarking this week.

⚡ CODING PROJECTS

Handpicked tutorials with notebooks for full implementations

NVIDIA SkillSpector Guide: Scanning AI Skills for Security Risks with Static Analysis and SARIF Reports Codes Tutorial
How to Build a QwenPaw Agent Workspace with Custom Skills, Model Providers, Console Access, and Streaming API Testing Codes Tutorial
Microsoft Fara Tutorial: Run a Browser-Use Agent in Google Colab with a Mock OpenAI-Compatible Endpoint Codes Tutorial
How to Build Memory-Efficient Transformers with xFormers Using Packed Sequences, GQA, ALiBi, SwiGLU, and Causal Attention Codes Tutorial
How to Build a Parsing Pipeline with Docling Parse for Layout-Aware Document Intelligence Codes Tutorial
Salesforce CodeGen Tutorial: Generate, Validate, and Rerank Python Functions With Unit Tests and Safety Checks Codes Tutorial
A Coding Implementation on Microsoft SkillOpt for Instrumented Prompt Optimization, Skill Evolution Analysis, and Baseline Comparison Codes Tutorial

📡 AI INTELLIGENCE BRIEF: The Week AI Memory Got a Brain and Agents Got a Directory

📈 TREND WATCH

🔴 LEAD STORY

📊 DEEP DIVES

Turn AI into Your Income Engine

⚡ CODING PROJECTS

How was today’s email?

Awesome | Decent | Not Great

Keep Reading

The newsletter platform built for AI Devs

📡 AI INTELLIGENCE BRIEF: The Week AI Memory Got a Brain and Agents Got a Directory

📈 TREND WATCH

🔴 LEAD STORY

📊 DEEP DIVES

Turn AI into Your Income Engine

⚡ CODING PROJECTS

Sponsor our next newsletter issue: PARTNER WITH US

How was today’s email?

Awesome | Decent | Not Great

Keep Reading

The newsletter platform built for AI Devs