Overview
m-a-p/CriticLeanGPT-Qwen3-8B-RL is a language model based on the Qwen3 architecture, specifically fine-tuned using Reinforcement Learning (RL). This model leverages the CriticLean_4K dataset, which is a subset of the broader CriticLeanInstruct dataset suite, designed for aligning large language models through SFT and RL processes.
Key Capabilities
- Reinforcement Learning Alignment: The model has undergone RL training using the
CriticLean_4K dataset, which focuses on critical evaluation data. - Mathematical and Code Reasoning: While this specific variant is RL-only, the underlying CriticLean dataset suite incorporates samples from OpenR1-Math-220k and OpenThoughts-114k-Code_decontaminated, suggesting an emphasis on improving performance in these domains.
- Critic-Guided Training: The model's training is informed by "critic lean data," indicating an optimization for generating responses that are critically sound or evaluated.
Good For
- Tasks requiring critical evaluation: Ideal for applications where the model's output needs to be logically sound and well-reasoned.
- Research in RL-based LLM alignment: Serves as a practical example of a model aligned through RL using a specialized dataset.
- Developing applications needing enhanced reasoning: Particularly in areas that benefit from mathematical or code-related understanding, given the dataset's composition.