AI Dev and Latest Releases

[Open Source] GibsonAI Releases Memori: An Open-Source SQL-Native Memory Engine for AI Agents. GibsonAI’s Memori is an open-source, SQL-native memory engine that equips AI agents with persistent, transparent memory by using standard relational databases like SQLite, PostgreSQL, or MySQL. It replaces costly, opaque vector-database setups with writable, queryable, ACID-compliant storage—yielding 80–90% lower infrastructure costs, 2–4× faster query times, full auditability, and zero vendor lock-in

[RL] A New MIT Study Shows Reinforcement Learning Minimizes Catastrophic Forgetting Compared to Supervised Fine-Tuning. They introduced RL’s Razor, showing that reinforcement learning (RL) preserves prior knowledge better than supervised fine-tuning (SFT). Their study demonstrates that catastrophic forgetting is strongly predicted by the KL divergence between the fine-tuned and base model, measured on the new task. Unlike SFT, which can push models far from their original distribution, RL’s on-policy updates bias toward KL-minimal solutions, enabling new skills while retaining old ones. Experiments across large language models and robotics confirm RL’s robustness, positioning KL divergence as a practical principle for designing continual learning methods.

[LLM] ParaThinker, introduced by researchers at Tsinghua University, addresses the test-time compute bottleneck in large language models (LLMs) caused by “Tunnel Vision,” where early tokens lock models into suboptimal reasoning paths. Instead of extending a single chain-of-thought, ParaThinker generates multiple diverse reasoning trajectories in parallel and fuses them into a final answer. Its architecture integrates specialized control tokens, thought-specific positional embeddings, and KV-cache reuse to maintain both accuracy and efficiency. On benchmarks such as AIME 2024/2025, AMC 2023, and MATH-500, ParaThinker improves accuracy by 12.3% (1.5B) and 7.5% (7B) over sequential baselines while adding only ~7% latency. This demonstrates that scaling reasoning in width—parallel thought exploration—outperforms traditional depth scaling, allowing smaller models to surpass much larger counterparts...

[LLM] Tilde has released TildeOpen LLM, a 30B-parameter multilingual model trained on EU supercomputers to support European languages, particularly under-represented ones such as Latvian, Lithuanian, and Ukrainian. Built with an equitable tokenizer and trained on ~2 trillion tokens, it ensures fair language representation and efficient inference. Open-sourced under CC-BY-4.0, the model enables GDPR-compliant self-hosting in local or EU clouds, reinforcing Europe’s data sovereignty. Positioned as a foundational model, TildeOpen will serve as the basis for specialized AI systems in translation, education, government, and industry, marking a key step in Europe’s sovereign AI infrastructure.

[OpenAI] From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem. Hallucinations in large language models are not mysterious flaws but statistically predictable errors that arise from the way models are trained and evaluated. During pretraining, even with perfectly clean data, cross-entropy optimization creates misclassification-like pressures that guarantee certain mistakes, especially on rare “singleton” facts seen only once in training. Post-training compounds the issue because most benchmarks use binary grading schemes that penalize abstaining (“I don’t know”) as much as being wrong, incentivizing models to guess confidently rather than admit uncertainty. This misalignment means leaderboards reward bluffing behavior, reinforcing hallucinations instead of suppressing them. The research suggests that reforming mainstream evaluations—by introducing explicit confidence thresholds and partial credit for abstention—could realign incentives, encouraging behavioral calibration and reducing overconfident falsehoods in practical deployments

[Information Retrieval] Meet ARGUS: A Scalable AI Framework for Training Large Recommender Transformers to One Billion Parameters. Yandex has introduced ARGUS (AutoRegressive Generative User Sequential modeling), a large-scale transformer-based framework for recommender systems that scales up to one billion parameters. This breakthrough places Yandex among a small group of global technology leaders — alongside Google, Netflix, and Meta — that have successfully overcome the long-standing technical barriers in scaling recommender transformers.

Editor’s Pick

[RAG & Efficiency] Meta Superintelligence Labs has introduced REFRAG, a decoding framework that re-engineers retrieval-augmented generation (RAG) for long-context efficiency. By compressing retrieved passages into chunk embeddings, selectively expanding critical ones via reinforcement learning, and shortening the decoder’s input sequence, REFRAG reduces quadratic attention costs and shrinks the KV cache. The result is a 16× context extension and up to 30.85× faster time-to-first-token (TTFT) compared to LLaMA baselines, all while maintaining or improving accuracy over prior methods like CEPE. This makes large-context RAG applications practical for both production-scale AI systems and research

How was today’s email?

Awesome  |   Decent    |  Not Great

Keep Reading

No posts found