Overview
electroglyph/gemma-3-4b-it-unslop-GRPO-v3 is a 4.3 billion parameter instruction-tuned model, building upon Google's gemma-3-4b-it. This version represents the third iteration of electroglyph's 'unslop' experiments, primarily focusing on enhancing text generation quality and reducing idiosyncratic outputs often seen in large language models. A key differentiator is its refined training methodology, which includes specific adjustments to temperature and reward functions to achieve more natural and diverse language.
Key Capabilities
- Improved Text Coherence: Training at a temperature of 1.0 has significantly reduced 'weird' or nonsensical outputs, leading to more coherent and logical responses.
- Reduced Repetitive Phrasing: The reward system was adjusted to allow a controlled number of complex sentences (e.g., with multiple commas), thereby cutting down on excessive parenthetical phrases without eliminating natural sentence structures.
- Enhanced Lexical Diversity: A sophisticated lexical diversity score, based on the Mean Type-Token Ratio (MTLD) from a large corpus of books, was integrated into the reward system. This encourages the model to produce text with a richer vocabulary, aiming for an MTLD score between 80-120.
Good for
- Applications requiring natural language generation where output quality and readability are paramount.
- Use cases demanding less repetitive and more varied text, such as creative writing, content generation, or advanced chatbots.
- Developers looking for a Gemma-based model with refined conversational characteristics and improved stylistic control.