herooooooooo/first-hf-run-pi-mono-gemma4-e2b-final

VISIONConcurrency Cost:1Model Size:5.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 22, 2026Architecture:Transformer Cold

The herooooooooo/first-hf-run-pi-mono-gemma4-e2b-final is a 5.1 billion parameter instruction-tuned language model based on Google's Gemma 4 E2B architecture, fine-tuned using Supervised Fine-Tuning (SFT). It was trained on redacted coding-agent session traces from the `badlogicgames/pi-mono` dataset, specifically optimized for coding-related tasks. This model excels at understanding and generating code, particularly within the context of the `pi` coding agent's interaction style. Its primary strength lies in its specialized fine-tuning for code generation and understanding based on real-world coding session data.

Loading preview...

Model Overview

The herooooooooo/first-hf-run-pi-mono-gemma4-e2b-final is a 5.1 billion parameter instruction-tuned model built upon the google/gemma-4-E2B-it base architecture. This model has been specifically fine-tuned using Supervised Fine-Tuning (SFT) on a unique dataset of redacted coding-agent session traces from badlogicgames/pi-mono.

Key Capabilities

  • Specialized Code Understanding: Fine-tuned on actual coding session data, making it adept at interpreting and generating code within a coding agent context.
  • Instruction Following: Optimized for instruction-based tasks, leveraging its base Gemma 4 E2B-it architecture.
  • Text-Only Training: The model's training focused exclusively on text data, omitting image, audio, and video payloads from the dataset.

Training and Selection

The model was selected based on the lowest held-out SFT evaluation loss from a sweep of different adapter configurations. The dataset used for fine-tuning was carefully processed, including deterministic redaction, deny-pattern filtering, and stripping of assistant thinking blocks and signatures to prevent training on raw hidden traces. Tool calls were represented as text, and tool results were folded into user context.

Known Limitations

  • Dataset Specificity: The model's performance may be optimized for the interaction style present in the pi-mono codebase, potentially leading to overfitting for that specific style.
  • Evaluation Metrics: Initial evaluations on HumanEval and MBPP benchmarks showed 0.0 accuracy, indicating that these standard benchmarks may not fully capture the model's specialized capabilities or that further evaluation is needed. The model was selected by SFT eval loss, not by these benchmarks.
  • Multimodal Input: Training was text-only; Gemma 4's multimodal capabilities were not fine-tuned or evaluated.