Model Overview
Lambent/Qwen3-4B-Base-Continued-GRPO-Style-Karcher is a 4 billion parameter model built upon the Qwen3-4B-Base architecture. It was created by applying a Karcher Mean merge to multiple adapters, each trained on specific domains using a SmolLM2-360M proxy model to fit style and lower perplexity. The merging process aimed to preserve distribution entropy through an attempted implementation of MARA.
Key Capabilities and Performance
- Improved Perplexity: The model shows a notable reduction in perplexity on the
lambada_openai task, achieving a 9.63% decrease compared to the base model and a 3.21% decrease compared to the GRPO-Wave variant. - Enhanced Diversity: It exhibits significant improvements in text generation diversity, with metrics like Distinct-1 increasing by 10.5% for
ao3_english and 3.6% for bbc_news compared to the base model. Pairwise diversity also saw increases in several domains. - Reasoning Enhancement: An additional experiment involved optimizing
<think> tokens in the embedding space, leading to a 96.7% accuracy on GSM8k 3-shot reasoning traces, up from 90.0% with original embeddings, and a 7.8% improvement in CE loss on a held-out evaluation set.
Merge Details
This model was merged using MergeKit with the Karcher Mean method, combining adapters trained on diverse domains including various GitHub programming languages (Python, JavaScript, C++), academic papers (arXiv CS, Math, Physics), and general text (Wikipedia English, BBC News, AO3 English).