Lambent/Qwen3-4B-Base-Continued-GRPO-Merge

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 1, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Lambent/Qwen3-4B-Base-Continued-GRPO-Merge is a 4 billion parameter language model based on the Qwen3 architecture, developed by Lambent. This model is a CABS sparsified version of the original GRPO training, merged using the TIES method to enhance performance. It demonstrates improved perplexity on the lambada_openai task and maintains strong performance across various reasoning and question-answering benchmarks. This model is suitable for applications requiring efficient language understanding and generation with a focus on optimized knowledge integration.

Loading preview...

Model Overview

Lambent/Qwen3-4B-Base-Continued-GRPO-Merge is a 4 billion parameter language model built upon the Qwen3 architecture. This model represents a specialized merge, incorporating a CABS (Context-Aware Bit-Sparsity) sparsified version of the original GRPO (Gradient-based Regularization for Parameter Optimization) training. It was merged with the Lambent/Qwen3-4B-Base-Continued-GRPO-B model using the TIES merge method, which allows for the injection of sparse GRPO knowledge into a base model.

Key Characteristics

  • Architecture: Based on the Qwen3 family, providing a robust foundation for language tasks.
  • Parameter Count: 4 billion parameters, offering a balance between performance and computational efficiency.
  • Merge Method: Utilizes the TIES merge method for combining models, specifically merging a CABS-sparsified GRPO model into a base.
  • Performance Improvements: Demonstrates a 4.6% reduction in perplexity on the lambada_openai task, indicating enhanced language modeling capabilities. It also shows slight improvements in openbookqa accuracy.

Intended Use Cases

This model is particularly well-suited for scenarios where optimized knowledge integration and efficient language processing are critical. Its specialized merging technique aims to leverage the strengths of both GRPO training and the base model, making it a candidate for:

  • Language Modeling: Improved perplexity suggests better next-token prediction.
  • Reasoning Tasks: Maintains strong performance on benchmarks like arc_easy and piqa.
  • Resource-Efficient Deployment: The CABS sparsification implies potential for more efficient inference compared to denser models, while retaining performance.