Lambent/Qwen3-4B-Base-Continued-GRPO-Style-Karcher

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 5, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Lambent/Qwen3-4B-Base-Continued-GRPO-Style-Karcher is a 4 billion parameter language model based on the Qwen3-4B-Base architecture, fine-tuned using a Karcher Mean merge of adapters. This model demonstrates improved perplexity on tasks like lambada_openai and enhanced diversity metrics, particularly in distinct-1 and pairwise diversity for domains such as ao3_english and bbc_news. It is optimized for generating more varied and less repetitive text outputs across diverse domains.

Loading preview...

Model Overview

Lambent/Qwen3-4B-Base-Continued-GRPO-Style-Karcher is a 4 billion parameter model built upon the Qwen3-4B-Base architecture. It was created by applying a Karcher Mean merge to multiple adapters, each trained on specific domains using a SmolLM2-360M proxy model to fit style and lower perplexity. The merging process aimed to preserve distribution entropy through an attempted implementation of MARA.

Key Capabilities and Performance

  • Improved Perplexity: The model shows a notable reduction in perplexity on the lambada_openai task, achieving a 9.63% decrease compared to the base model and a 3.21% decrease compared to the GRPO-Wave variant.
  • Enhanced Diversity: It exhibits significant improvements in text generation diversity, with metrics like Distinct-1 increasing by 10.5% for ao3_english and 3.6% for bbc_news compared to the base model. Pairwise diversity also saw increases in several domains.
  • Reasoning Enhancement: An additional experiment involved optimizing <think> tokens in the embedding space, leading to a 96.7% accuracy on GSM8k 3-shot reasoning traces, up from 90.0% with original embeddings, and a 7.8% improvement in CE loss on a held-out evaluation set.

Merge Details

This model was merged using MergeKit with the Karcher Mean method, combining adapters trained on diverse domains including various GitHub programming languages (Python, JavaScript, C++), academic papers (arXiv CS, Math, Physics), and general text (Wikipedia English, BBC News, AO3 English).