m-a-p/CriticLeanGPT-Qwen3-8B-RL

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jul 8, 2025Architecture:Transformer0.0K Cold

CriticLeanGPT-Qwen3-8B-RL is a Qwen3-based language model developed by m-a-p, fine-tuned using Reinforcement Learning (RL) with the CriticLean_4K dataset. This model is specifically aligned for tasks requiring critical evaluation and reasoning, leveraging a dataset that includes mathematical and coding data. It is designed to enhance model performance in areas where precise and structured responses are crucial.

Loading preview...

Overview

m-a-p/CriticLeanGPT-Qwen3-8B-RL is a language model based on the Qwen3 architecture, specifically fine-tuned using Reinforcement Learning (RL). This model leverages the CriticLean_4K dataset, which is a subset of the broader CriticLeanInstruct dataset suite, designed for aligning large language models through SFT and RL processes.

Key Capabilities

  • Reinforcement Learning Alignment: The model has undergone RL training using the CriticLean_4K dataset, which focuses on critical evaluation data.
  • Mathematical and Code Reasoning: While this specific variant is RL-only, the underlying CriticLean dataset suite incorporates samples from OpenR1-Math-220k and OpenThoughts-114k-Code_decontaminated, suggesting an emphasis on improving performance in these domains.
  • Critic-Guided Training: The model's training is informed by "critic lean data," indicating an optimization for generating responses that are critically sound or evaluated.

Good For

  • Tasks requiring critical evaluation: Ideal for applications where the model's output needs to be logically sound and well-reasoned.
  • Research in RL-based LLM alignment: Serves as a practical example of a model aligned through RL using a specialized dataset.
  • Developing applications needing enhanced reasoning: Particularly in areas that benefit from mathematical or code-related understanding, given the dataset's composition.