📈 TREND WATCH
The agent infrastructure layer is being built in real-time. Memory, isolation, orchestration, and observability — the four pillars of production-grade agents — all shipped new primitives this week.

🔴 LEAD STORY
Perplexity Launches Brain — A Self-Improving Memory System That Gets Smarter Every Night Perplexity · Jun 18 · Agent Memory
What it is: Brain is a continuously learning memory layer inside Computer. Every task the agent completes plugs into a context graph. Overnight, Brain reviews that graph, synthesizes learnings, and builds an LLM wiki — so tomorrow's agent runs start smarter than yesterday's.
Capability | Detail |
|---|---|
Memory type | Context graph — work history, not user history |
Update cycle | Overnight synthesis |
Output | Auto-generated LLM wiki for future runs |
Pricing | $200/mo (Computer plan) |
Why it matters: Every other agent memory system remembers you. Brain remembers your work — what the agent did, which sources it used, what patterns produced good results. The agent compounds. This is the first production memory system that improves agent performance on future tasks, not just recall.
Decision signal: If you're running Computer at scale, this changes the ROI math. Agent quality improves without any additional prompting investment. The longer you run it, the better it gets.
📊 DEEP DIVES
#1 Vercel: Eve — Open-Source Agent Framework Where an Agent Is Just a Directory Vercel · Jun 17 · Agent Framework
The core idea: An agent isn't a class, a graph, or a config file. It's a directory of files. Eve compiles the directory, wires up durable workflows, and connects channels automatically.

What ships: Durable execution built-in · production-ready from day one · npm package eve · Vercel Connect for channel auth (Slack out of the box)
So what: Every other framework requires you to think in graphs, chains, or orchestration patterns. Eve makes an agent a filesystem artifact — versionable, diffable, deployable in one command. The lowest cognitive overhead agent framework that ships to production. Open source.
#2 MiniMax Sparse Attention: 28.4× Less Compute at 1M Context, Trained on 109B MoE MiniMax · Jun 17 · Architecture Research
Metric | Result |
|---|---|
Per-token attention compute (1M ctx) | 28.4× lower vs. dense GQA |
Prefill speed (H800) | 14.2× faster |
Decode speed (H800) | 7.6× faster |
Quality vs. GQA | On par |
Training budget | 3T tokens · 109B MoE |
Mechanism: Two-branch block-sparse attention — a local branch for recent context + a global branch for long-range dependencies. GQA-native, so it drops into existing architectures without redesign.
So what: Long-context inference is currently the biggest cost driver in production LLM serving. A 28.4× compute reduction at 1M context with zero quality loss is an infrastructure-level result. This will be in every long-context serving stack within 12 months.
Turn AI into Your Income Engine
Ready to transform artificial intelligence from a buzzword into your personal revenue generator?
HubSpot’s groundbreaking guide "200+ AI-Powered Income Ideas" is your gateway to financial innovation in the digital age.
Inside you'll discover:
A curated collection of 200+ profitable opportunities spanning content creation, e-commerce, gaming, and emerging digital markets—each vetted for real-world potential
Step-by-step implementation guides designed for beginners, making AI accessible regardless of your technical background
Cutting-edge strategies aligned with current market trends, ensuring your ventures stay ahead of the curve
Download your guide today and unlock a future where artificial intelligence powers your success. Your next income stream is waiting.
#3 Hermes Agent: Async Subagents — Delegated Work No Longer Blocks the Parent Chat Nous Research · Jun 16 · Open Agent
What changed: The delegate tool now spawns subagents asynchronously. Parent chat stays fully responsive while background agents run research, refactors, builds, or analyses in parallel.
Pattern unlocked:

So what: Synchronous delegation was the biggest bottleneck in multi-agent Hermes workflows. Async subagents turn Hermes into a true parallel workstreams machine. Open source. Available now.
#4 OpenAI: Deployment Simulation — Predicts Model Behavior Before Release by Replaying Real Conversations OpenAI · Jun 16 · Safety / Research
How it works: Takes real production conversations from the previous model → replays them through the candidate new model → flags behavioral drift before the new model ships.
The finding: Deployment Simulation catches undesired behavior that standard pre-deployment evals miss entirely — because synthetic evals don't capture the long tail of real user interaction patterns.
So what: This is the most credible pre-deployment safety methodology published to date. It's also a direct response to the Fable 5 situation — where a model shipped, surprised the government, and got pulled within 36 hours. For any team building model evaluation pipelines, this paper is required reading.
#5 Liquid AI: LFM2.5-Embedding-350M + LFM2.5-ColBERT-350M — Fast Multilingual Search Across 11 Languages Liquid AI · Jun 19 · Retrieval / Embeddings
Model | Type | Params | Training |
|---|---|---|---|
LFM2.5-Embedding-350M | Dense bi-encoder | 350M | 28T tokens |
LFM2.5-ColBERT-350M | Late interaction | 350M | 28T tokens |
What's interesting under the hood: Both models were built by patching LFM2.5-350M — a causal decoder — into a bidirectional encoder. That's an unusual architectural move: taking a generative model pretrained on 28T tokens and converting it into a retrieval model, rather than training an encoder from scratch. Result: retrieval quality that beats larger dedicated embedding models.
Coverage: 11 languages · drop-in replacement for existing RAG pipelines · available on Hugging Face now
So what: Two distinct retrieval patterns in one release — dense bi-encoder for speed, ColBERT late-interaction for precision. Both at 350M params, both multilingual, both cheaper to run than the models they beat. For any team running multilingual RAG today, this is a direct upgrade worth benchmarking this week.
⚡ CODING PROJECTS
Handpicked tutorials with notebooks for full implementations


