Overview
This model, gemma-3-1b-it-ghigliottina-grpo-merged-ckpt564, is a 1 billion parameter Gemma-3-1B-IT base model fine-tuned by nazdef. It's specifically designed for the Italian word game "Ghigliottina," where the goal is to find a single common word linking five given clues. The model is a merged version of the base model and a LoRA adapter, making it a standalone, directly loadable model without needing separate adapter application.
Key Capabilities
- Ghigliottina Game Solving: Optimized to identify the common word connecting five bullet-point clues in Italian.
- Structured Output: Trained to produce output in a specific format, including a
<think> section for reasoning and a soluzione: <parola>. for the final answer. - GRPO Training: Utilizes a custom GRPO (Generative Reinforcement Learning with Policy Optimization) pipeline with multi-component reward shaping, including format rewards, exact match, embedding similarity, and reasoning rewards.
Use Cases
- Italian Word Game Applications: Ideal for integrating into applications that require solving the Ghigliottina game or similar word association tasks in Italian.
- Baseline for Further Development: Serves as a merged baseline model for continued experimentation and improvement in structured reasoning tasks.
Limitations
- As an intermediate checkpoint, the model may not always perfectly adhere to the strict output format.
- The
exact_match performance is noted as not yet high at this specific checkpoint.