
In this edition of our newsletter, we bring you the latest updates on language and vision models. Discover the new and improved Flan-UL2 20B from Google, the largest vision-language model PaLM-E, and the controversy surrounding ChatGPT's parameter count. We also introduce you to VoxFormer, a novel sparse voxel transformer, and discuss a new AI research proposal from UC Berkeley. Stay informed on the cutting-edge developments in the field by reading on!
Did you like Flan-T5? Now Check out the new open source Flan-UL2 20B: Flan-UL2 (20B params) from Google is so far the best open source LLM out there, as measured on MMLU (55.7) and BigBench Hard (45.9). It surpasses Flan-T5-XXL (11B). It's been instruction fine-tuned with a 2048 token window.
What happens when we train the largest vision-language model and add in robot experiences? Meet PaLM-E 🌴, a 562-billion parameter, general-purpose, embodied visual-language generalist - across robotics, vision, and language. PaLM-E enables robot planning directly from pixels – all in a single model, trained end-to-end. PaLM-E is the largest VLM reported to date.

Is ChatGPT really 175 Billion Parameters? 🤔This blog post from Owen concretely disproves this theory with publicly available information and verifiable, reproducible analysis. It is typical to store LLM weights as 8-bit integers in the INT8 format for lower latency inferencing, higher throughput and a 2x lower memory footprint compared to storing them in the float16 format. It takes 1 byte to store each INT8 parameter. Simple math shows the model will take 175GB of space to store.
VoxFormer: A novel sparse voxel transformer that improves the efficiency and accuracy of camera-based 3D semantic scene completion. The proposed method uses a two-stage approach that first predicts a sparse set of 3D voxels from RGB images and then refines them to obtain the final dense voxel representations. The method incorporates a transformer-based encoder-decoder network that is designed to handle both local and global context information.
🚀 How can meta-learning, self-attention & JAX power the next generation of Evolutionary Optimizers? A new AI paper from Deepmind leverages the insight that Evolution Strategies ES updates are inherently set operations & parametrize a new family of ES optimization algorithms using tiny Set Transformers. The weights of this attention-based ∇-free optimizer are meta-evolved on a class of diverse optimization tasks. The resulting learned evolution strategy (LES) is capable of strong transfer to previously unseen tasks, longer horizons & larger search spaces. Importantly, it generalizes to high-dim neural network tasks without having been meta-trained on such strong generalization!

D5 Task: A New AI Research from UC Berkeley Proposes A D5 Task And A Benchmark Dataset To Make LLMs Do Research. The D5 task, proposed by the researchers, is a goal-driven method of discovering dissimilarities in distributions using linguistic descriptions. This finding must meet two criteria: (1) it must be true (that is, the predicate is more often true for corpus A than B), and (2) it must be driven by the study purpose and hence be relevant, innovative, and noteworthy.