m-a-p/CriticLeanGPT-Qwen3-14B-RL
The m-a-p/CriticLeanGPT-Qwen3-14B-RL is a 14 billion parameter Qwen3-based large language model developed by m-a-p, fine-tuned using Reinforcement Learning (RL) with the CriticLean_4K dataset. This model is specifically optimized for mathematical formalization and reasoning tasks, leveraging a dataset designed for critic-guided reinforcement learning. It features a 32768 token context length, making it suitable for complex problem-solving in math and code domains.
Loading preview...
What is CriticLeanGPT-Qwen3-14B-RL?
m-a-p/CriticLeanGPT-Qwen3-14B-RL is a 14 billion parameter language model built upon the Qwen3 architecture. It has been fine-tuned using Reinforcement Learning (RL) with the CriticLean_4K dataset, which is a subset of the larger CriticLeanInstruct dataset suite. This RL approach aims to align the model for improved performance, particularly in areas requiring critical evaluation and mathematical reasoning.
Key Characteristics
- Base Model: Qwen3, a powerful large language model.
- Parameter Count: 14 billion parameters.
- Context Length: Supports a substantial context window of 32768 tokens.
- Training Methodology: Underwent Reinforcement Learning (RL) using the
CriticLean_4Kdataset, which is specifically designed for critic-guided learning. - Dataset Integration: The CriticLeanInstruct dataset, used for training, incorporates samples from OpenR1-Math-220k and OpenThoughts-114k-Code_decontaminated, indicating a focus on mathematical and coding capabilities.
Ideal Use Cases
This model is particularly well-suited for applications requiring:
- Mathematical Formalization: Excels in tasks related to mathematical reasoning and problem-solving due to its RL training on math-centric data.
- Code-related Tasks: Benefits from the inclusion of code data in its training, making it capable for code generation or understanding.
- Research in RL-based LLM Alignment: Demonstrates an effective application of critic-guided reinforcement learning for model alignment.