electroglyph/gemma-3-4b-it-unslop-GSPO

Cold
Public
Vision
4.3B
BF16
32768
License: gemma
Hugging Face
Overview

Overview

The electroglyph/gemma-3-4b-it-unslop-GSPO is an experimental 4.3 billion parameter language model, derived from google/gemma-3-4b-it. This iteration focuses on exploring the GSPO (Generalized Supervised Policy Optimization) finetuning technique, specifically with a lower rank setting (16) compared to previous experiments.

Key Characteristics

  • Base Model: Google's Gemma-3-4b-it.
  • Finetuning Method: Utilizes GSPO, an evolution from prior GRPO experiments, aiming for enhanced stability with lower ranks.
  • Context Length: Supports a substantial context window of 32768 tokens.
  • Experimental Focus: This version investigates how a reduced rank in GSPO influences model behavior, noting differences in markdown suppression and overall feel compared to earlier finetunes.

Potential Use Cases

  • Research and Development: Ideal for researchers and developers interested in the effects of different finetuning parameters, particularly GSPO and rank settings, on Gemma-based models.
  • Instruction Following: As an instruction-tuned model, it's suitable for tasks requiring adherence to specific prompts, though its experimental nature suggests careful evaluation for production use.
  • Exploration of 'Unslop' Behavior: Users can evaluate its performance in scenarios where the 'unslop' characteristic (reduced verbosity or specific response styles) is desired, comparing its output to other finetunes.