May 7, 2026
Google's Gemma 4 AI models get 3x speed boost by predicting future tokens
Ever wonder how AI models get faster without new hardware? Well, Google’s Gemma 4 just made a giant leap — thanks to something called speculative decoding. According to Ryan Whitwam writing in Technology, these models predict future tokens to speed up generation, cutting processing time by up to three times. Now, here’s where it gets interesting: Google’s new Multi-Token Prediction technique lets Gemma guess what’s coming next, making it much quicker on local hardware. This is especially big because, as Whitwam points out, Gemma’s built to run on Google’s custom TPUs, but now with MTP, it can work faster even on consumer GPUs. Plus, with the license change to Apache 2.0, Google is making it easier for developers to tinker with the tech themselves. So, what does this mean for you? Faster, more private AI on your own device — no cloud needed. But here’s the thing — hardware limitations still matter. The real question is, who will harness this breakthrough before it becomes the new standard?