Table of Contents


Google just dropped Gemma 4, and the numbers are hard to ignore. A 31B model that ranks fourth globally on the Arena AI leaderboard as of writing this article, sitting above models with 10 to 30 times the parameters. That is not a minor update. That is a different category of open model.

Gemma 4 is now available on Featherless. You can run all four variants today through the same OpenAI-compatible API you are already using.

What changed from Gemma 3

The short version: everything that matters.

Gemma 3's 27B model scored 20.8% on AIME 2026, the math reasoning benchmark. Gemma 4's 31B dense model scores 89.2%. On LiveCodeBench, coding performance jumped from 29.1% to 80.0%. On GPQA Diamond, a graduate-level science benchmark, it went from 42.4% to 84.3%.

These are not incremental improvements. The training recipe changed, the thinking mode was added, and the architecture was refined across all four model sizes. The result is a family that outperforms models several times its size on tasks that actually matter for production use.

Four models, one API

Gemma 4 ships as a family. Here is what is available:

Gemma 4 E2B and E4B are built for edge and on-device use, with 2.3B and 4.5B effective parameters respectively. They support text, images, and audio natively. The E2B runs in around 3 GB of memory at 4-bit quantization.

Gemma 4 26B A4B is a Mixture of Experts model. It activates only 3.8B parameters per forward pass while delivering performance close to the full 31B. It is fast and efficient, a good default for most production workloads.

Gemma 4 31B Dense is the flagship. 256K context, native image and video input, built-in thinking mode, and an Arena ELO of 1,452. At that score it sits above Qwen 3.5 397B and Llama 4 Maverick, both of which are vastly larger models.

All four share a 262K vocabulary, hybrid attention for long context, and tool calling support out of the box.

Why run it on Featherless

You do not want to manage GPU infrastructure for four model variants. You do not want to spin up separate endpoints, handle quantization, or worry about cold starts. Featherless gives you access to 30,000+ models, including the full Gemma 4 family, through one API endpoint with no setup.

The API is OpenAI-compatible. If you have existing code calling GPT or Claude, switching to Gemma 4 on Featherless is a two-line change.

from openai import OpenAI
client = OpenAI(
   base_url="https://api.featherless.ai/v1",
   api_key="rc_6d06442d7fe7c036ee1fc0c57856aa23671b2f72e3c0c239557baeed562f1c6f",
)
response = client.chat.completions.create(
   model='google/gemma-4-31B',
   messages=[
       {"role": "system", "content": "You are a helpful assistant."},
       {"role": "user", "content": "Hello!"}
   ],
)
print(response.model_dump()['choices'][0]['message']['content'])


Flat-rate pricing means high-volume agentic workloads do not come with surprise bills. No logs means your prompts and completions stay private.

The Apache 2.0 shift matters

Previous Gemma releases used a custom license with restrictions on commercial use, synthetic data generation, and downstream enforcement requirements. Gemma 4 ships under Apache 2.0. That means no usage caps, no MAU limits, no legal ambiguity. You can fine-tune it, build products on top of it, and redistribute it without asking permission.

For teams evaluating foundation models for production, that change removes the last meaningful adoption barrier.

The honest tradeoff

 Gemma 4 is not the answer for every workload. If raw coding throughput is your priority, Qwen 3.5 32B is worth evaluating. But for multilingual quality, vision tasks, and overall human-preference rankings, Gemma 4 31B is the stronger choice.

What Gemma 4 does win clearly: human preference rankings, multilingual quality, vision tasks, and the overall quality per parameter calculation at the 30B tier. If you are building something where output quality matters more than raw throughput, it is the right call.

Try it now

Gemma 4 is live on Featherless. All four variants, no setup, same API you already know.

Sign up and start building at featherless.ai

Start building under 3 minutes