Google announces Gemma 4 open AI models, switches to Apache 2.0 license

April 3, 2026

So, here’s something that caught my attention — Google just dropped Gemma 4, their latest open AI model, and it’s a game changer. According to Ryan Whitwam at Ars Technica, these models have made huge leaps over the past year, and now developers can finally get their hands on Gemma 4 with four different sizes, all optimized for local use. Now, here’s where it gets interesting — Google is ditching its custom license and switching to the Apache 2.0 license, giving more freedom to users. The big models, like the 26B Mixture of Experts and 31B Dense, are designed to run on high-end Nvidia H100 GPUs — think $20,000 hardware — but Google also says they can be scaled down for consumer GPUs if you’re willing to fine-tune. Whitwam points out that the 26B model is optimized for speed, activating only part of its parameters to process faster, while the 31B focuses on quality. So what does this all mean? More accessible, flexible AI for everyone, with less licensing fuss — and that’s pretty exciting for developers.

Google's Gemini AI models have improved by leaps and bounds over the past year, but you can only use Gemini on Google's terms. The company's Gemma open-weight models have provided more freedom, but Gemma 3, which launched over a year ago, is getting a bit long in the tooth. Starting today, developers can start working with Gemma 4, which comes in four sizes optimized for local usage. Google has also acknowledged developer frustrations with AI licensing, so it's dumping the custom Gemma license.

Like past versions of its open-weight models, Google has designed Gemma 4 to be usable on local machines. That can mean plenty of things, of course. The two large Gemma variants, 26B Mixture of Experts and 31B Dense, are designed to run unquantized in bfloat16 format on a single 80GB Nvidia H100 GPU. Granted, that's a $20,000 AI accelerator, but it's still local hardware. If quantized to run at lower precision, these big models will fit on consumer GPUs.

Google also claims it has focused on reducing latency to really take advantage of Gemma's local processing. The 26B Mixture of Experts model activates only 3.8 billion of its 26 billion parameters in inference mode, giving it much higher tokens-per-second than similarly sized models. Meanwhile, 31B Dense is more about quality than speed, but Google expects developers to fine-tune it for specific uses.

Audio Transcript

View original article

0:00/0:00