autotrust/gemma4-31B-Fable-5-Distilled

Hugging Face
VISIONConcurrency Cost:2Model Size:31BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 23, 2026License:gemmaArchitecture:Transformer0.0K Warm

autotrust/gemma4-31B-Fable-5-Distilled is a 31.27 billion parameter Gemma 4-based model from AutoTrust AI Lab, fine-tuned using LoRA on agentic coding traces from Fable 5. It significantly enhances coding and tool-use performance, achieving 92.7% on HumanEval pass@1, while uniquely preserving the base model's multimodal vision capabilities. This model is optimized for agentic code generation, tool-use planning, and image description, making it suitable for complex coding and visual reasoning tasks.

Loading preview...

What the fuck is this model about?

autotrust/gemma4-31B-Fable-5-Distilled is a 31.27 billion parameter model developed by AutoTrust AI Lab, built upon Google's gemma-4-31B-it base. It's a parameter-efficient fine-tune (LoRA) specifically designed to boost agentic coding and tool-use performance.

What makes THIS different from all the other models?

This model stands out primarily due to its unique layer-freezing strategy during fine-tuning. Unlike many coding fine-tunes that degrade multimodal capabilities, Fable-5-Distilled applies LoRA adapters only to the upper half of the transformer stack (layers 30-59), leaving the lower layers (0-29) frozen. This ensures that the base model's multimodal vision capabilities are fully preserved while still achieving significant uplift in coding performance.

Key Differentiators:

  • Preserved Multimodal Vision: Maintains image description quality identical to the base Gemma 4 model.
  • Exceptional Coding Performance: Achieves 92.7% pass@1 on HumanEval, a +15.9 point improvement over the base google/gemma-4-31B-it (76.8%).
  • Efficient Fine-tuning: This performance gain is achieved with only 0.20% of parameters trainable (61.2M out of 31.27B), demonstrating high-quality distillation from a small, curated dataset (308 examples).
  • Agentic Capabilities: Trained on agentic coding traces from Fable 5, enabling chain-of-thought reasoning and structured JSON tool-call outputs.

Should I use this for my use case?

Good for:

  • Agentic Code Generation & Explanation: If your application requires a model that can generate code, explain it, and perform chain-of-thought reasoning.
  • Tool-Use Planning: For scenarios where the model needs to output structured JSON for tool invocations.
  • Multimodal Applications: When you need strong coding capabilities without sacrificing the ability to process and describe images.
  • General-Purpose Chat with Thinking: The model is trained with enable_thinking=True for more robust and reasoned responses.

Consider Alternatives if:

  • You need a model for tasks not related to coding, tool-use, or multimodal understanding, as its specialization might not be fully utilized.
  • You require a model with a larger fine-tuning dataset for broader generalization across highly diverse coding domains (though this model's quality-first approach is effective).
  • You cannot accommodate the model's preference for enable_thinking=True in production, as responses without thinking may be suboptimal.