What is CriticLeanGPT-Qwen3-14B-RL?
m-a-p/CriticLeanGPT-Qwen3-14B-RL is a 14 billion parameter language model built upon the Qwen3 architecture. It has been fine-tuned using Reinforcement Learning (RL) with the CriticLean_4K dataset, which is a subset of the larger CriticLeanInstruct dataset suite. This RL approach aims to align the model for improved performance, particularly in areas requiring critical evaluation and mathematical reasoning.
Key Characteristics
- Base Model: Qwen3, a powerful large language model.
- Parameter Count: 14 billion parameters.
- Context Length: Supports a substantial context window of 32768 tokens.
- Training Methodology: Underwent Reinforcement Learning (RL) using the
CriticLean_4K dataset, which is specifically designed for critic-guided learning. - Dataset Integration: The CriticLeanInstruct dataset, used for training, incorporates samples from OpenR1-Math-220k and OpenThoughts-114k-Code_decontaminated, indicating a focus on mathematical and coding capabilities.
Ideal Use Cases
This model is particularly well-suited for applications requiring:
- Mathematical Formalization: Excels in tasks related to mathematical reasoning and problem-solving due to its RL training on math-centric data.
- Code-related Tasks: Benefits from the inclusion of code data in its training, making it capable for code generation or understanding.
- Research in RL-based LLM Alignment: Demonstrates an effective application of critic-guided reinforcement learning for model alignment.