Hi There,
Dive into the hottest AI breakthroughs of the week—handpicked just for you!
Super Important AI News 🔥 🔥 🔥
📢 Neural Magic Unveils Machete: A New Mixed-Input GEMM Kernel for NVIDIA Hopper GPUs
Featured AI Research 🛡️🛡️🛡️
SeedLM: A Post-Training Compression Method that Uses Pseudo-Random Generators to Efficiently Encode and Compress LLM Weights
Summary
SeedLM is a novel post-training compression method for Large Language Models (LLMs) that utilizes seeds of pseudo-random generators to encode and compress model weights. This method trades increased compute for reduced memory accesses, effectively speeding up memory-bound tasks like autoregressive generation. SeedLM outperforms state-of-the-art compression techniques by achieving nearly identical accuracy with 4-bit compression, which is a significant feat for LLMs, especially without the use of calibration data. The authors demonstrate the effectiveness of SeedLM on Llama 2 and Llama 3 models, which are particularly challenging to compress, showing that it achieves significantly better zero-shot accuracy retention at 4- and 3-bit than existing methods. Furthermore, FPGA-based tests show that SeedLM approaches a 4x speed-up over an FP16 baseline as the model size increases to 70B…