nazdef/gemma-3-1b-it-ghigliottina-grpo-merged-ckpt564
The nazdef/gemma-3-1b-it-ghigliottina-grpo-merged-ckpt564 is a 1 billion parameter Gemma-3-1B-IT model, fine-tuned by nazdef, specifically optimized for solving the Italian "Ghigliottina" word game. This model integrates a LoRA adapter via `merge_and_unload()` to provide a standalone solution, excelling at identifying a common word linking five given clues. It is designed for structured output, including a thinking process and a precise solution format.
Loading preview...
Overview
This model, gemma-3-1b-it-ghigliottina-grpo-merged-ckpt564, is a 1 billion parameter Gemma-3-1B-IT base model fine-tuned by nazdef. It's specifically designed for the Italian word game "Ghigliottina," where the goal is to find a single common word linking five given clues. The model is a merged version of the base model and a LoRA adapter, making it a standalone, directly loadable model without needing separate adapter application.
Key Capabilities
- Ghigliottina Game Solving: Optimized to identify the common word connecting five bullet-point clues in Italian.
- Structured Output: Trained to produce output in a specific format, including a
<think>section for reasoning and asoluzione: <parola>.for the final answer. - GRPO Training: Utilizes a custom GRPO (Generative Reinforcement Learning with Policy Optimization) pipeline with multi-component reward shaping, including format rewards, exact match, embedding similarity, and reasoning rewards.
Use Cases
- Italian Word Game Applications: Ideal for integrating into applications that require solving the Ghigliottina game or similar word association tasks in Italian.
- Baseline for Further Development: Serves as a merged baseline model for continued experimentation and improvement in structured reasoning tasks.
Limitations
- As an intermediate checkpoint, the model may not always perfectly adhere to the strict output format.
- The
exact_matchperformance is noted as not yet high at this specific checkpoint.