Name: Lambent/Qwen3-4B-Base-Continued-GRPO-Style-Karcher API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Lambent

Model Overview

Lambent/Qwen3-4B-Base-Continued-GRPO-Style-Karcher is a 4 billion parameter model built upon the Qwen3-4B-Base architecture. It was created by applying a Karcher Mean merge to multiple adapters, each trained on specific domains using a SmolLM2-360M proxy model to fit style and lower perplexity. The merging process aimed to preserve distribution entropy through an attempted implementation of MARA.

Key Capabilities and Performance

Improved Perplexity: The model shows a notable reduction in perplexity on the lambada_openai task, achieving a 9.63% decrease compared to the base model and a 3.21% decrease compared to the GRPO-Wave variant.
Enhanced Diversity: It exhibits significant improvements in text generation diversity, with metrics like Distinct-1 increasing by 10.5% for ao3_english and 3.6% for bbc_news compared to the base model. Pairwise diversity also saw increases in several domains.
Reasoning Enhancement: An additional experiment involved optimizing <think> tokens in the embedding space, leading to a 96.7% accuracy on GSM8k 3-shot reasoning traces, up from 90.0% with original embeddings, and a 7.8% improvement in CE loss on a held-out evaluation set.

Merge Details

This model was merged using MergeKit with the Karcher Mean method, combining adapters trained on diverse domains including various GitHub programming languages (Python, JavaScript, C++), academic papers (arXiv CS, Math, Physics), and general text (Wikipedia English, BBC News, AO3 English).

Overview

Model Overview

Key Capabilities and Performance

Merge Details

Full Model Card (README)