autotrust/gemma4-31B-Fable-5-Distilled
autotrust/gemma4-31B-Fable-5-Distilled is a 31.27 billion parameter Gemma 4-based model from AutoTrust AI Lab, fine-tuned using LoRA on agentic coding traces from Fable 5. It significantly enhances coding and tool-use performance, achieving 92.7% on HumanEval pass@1, while uniquely preserving the base model's multimodal vision capabilities. This model is optimized for agentic code generation, tool-use planning, and image description, making it suitable for complex coding and visual reasoning tasks.
Loading preview...
What the fuck is this model about?
autotrust/gemma4-31B-Fable-5-Distilled is a 31.27 billion parameter model developed by AutoTrust AI Lab, built upon Google's gemma-4-31B-it base. It's a parameter-efficient fine-tune (LoRA) specifically designed to boost agentic coding and tool-use performance.
What makes THIS different from all the other models?
This model stands out primarily due to its unique layer-freezing strategy during fine-tuning. Unlike many coding fine-tunes that degrade multimodal capabilities, Fable-5-Distilled applies LoRA adapters only to the upper half of the transformer stack (layers 30-59), leaving the lower layers (0-29) frozen. This ensures that the base model's multimodal vision capabilities are fully preserved while still achieving significant uplift in coding performance.
Key Differentiators:
- Preserved Multimodal Vision: Maintains image description quality identical to the base Gemma 4 model.
- Exceptional Coding Performance: Achieves 92.7% pass@1 on HumanEval, a +15.9 point improvement over the base
google/gemma-4-31B-it(76.8%). - Efficient Fine-tuning: This performance gain is achieved with only 0.20% of parameters trainable (61.2M out of 31.27B), demonstrating high-quality distillation from a small, curated dataset (308 examples).
- Agentic Capabilities: Trained on agentic coding traces from Fable 5, enabling chain-of-thought reasoning and structured JSON tool-call outputs.
Should I use this for my use case?
Good for:
- Agentic Code Generation & Explanation: If your application requires a model that can generate code, explain it, and perform chain-of-thought reasoning.
- Tool-Use Planning: For scenarios where the model needs to output structured JSON for tool invocations.
- Multimodal Applications: When you need strong coding capabilities without sacrificing the ability to process and describe images.
- General-Purpose Chat with Thinking: The model is trained with
enable_thinking=Truefor more robust and reasoned responses.
Consider Alternatives if:
- You need a model for tasks not related to coding, tool-use, or multimodal understanding, as its specialization might not be fully utilized.
- You require a model with a larger fine-tuning dataset for broader generalization across highly diverse coding domains (though this model's quality-first approach is effective).
- You cannot accommodate the model's preference for
enable_thinking=Truein production, as responses without thinking may be suboptimal.