Overview
Overview
The electroglyph/gemma-3-4b-it-unslop-GSPO is an experimental 4.3 billion parameter language model, derived from google/gemma-3-4b-it. This iteration focuses on exploring the GSPO (Generalized Supervised Policy Optimization) finetuning technique, specifically with a lower rank setting (16) compared to previous experiments.
Key Characteristics
- Base Model: Google's Gemma-3-4b-it.
- Finetuning Method: Utilizes GSPO, an evolution from prior GRPO experiments, aiming for enhanced stability with lower ranks.
- Context Length: Supports a substantial context window of 32768 tokens.
- Experimental Focus: This version investigates how a reduced rank in GSPO influences model behavior, noting differences in markdown suppression and overall feel compared to earlier finetunes.
Potential Use Cases
- Research and Development: Ideal for researchers and developers interested in the effects of different finetuning parameters, particularly GSPO and rank settings, on Gemma-based models.
- Instruction Following: As an instruction-tuned model, it's suitable for tasks requiring adherence to specific prompts, though its experimental nature suggests careful evaluation for production use.
- Exploration of 'Unslop' Behavior: Users can evaluate its performance in scenarios where the 'unslop' characteristic (reduced verbosity or specific response styles) is desired, comparing its output to other finetunes.