Name: m-a-p/CriticLeanGPT-Qwen3-8B-RL API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: m-a-p

Overview

m-a-p/CriticLeanGPT-Qwen3-8B-RL is a language model based on the Qwen3 architecture, specifically fine-tuned using Reinforcement Learning (RL). This model leverages the CriticLean_4K dataset, which is a subset of the broader CriticLeanInstruct dataset suite, designed for aligning large language models through SFT and RL processes.

Key Capabilities

Reinforcement Learning Alignment: The model has undergone RL training using the CriticLean_4K dataset, which focuses on critical evaluation data.
Mathematical and Code Reasoning: While this specific variant is RL-only, the underlying CriticLean dataset suite incorporates samples from OpenR1-Math-220k and OpenThoughts-114k-Code_decontaminated, suggesting an emphasis on improving performance in these domains.
Critic-Guided Training: The model's training is informed by "critic lean data," indicating an optimization for generating responses that are critically sound or evaluated.

Good For

Tasks requiring critical evaluation: Ideal for applications where the model's output needs to be logically sound and well-reasoned.
Research in RL-based LLM alignment: Serves as a practical example of a model aligned through RL using a specialized dataset.
Developing applications needing enhanced reasoning: Particularly in areas that benefit from mathematical or code-related understanding, given the dataset's composition.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)