m-a-p/CriticLeanGPT-Qwen3-14B-RL

TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kPublished:Jul 10, 2025Architecture:Transformer Cold

The m-a-p/CriticLeanGPT-Qwen3-14B-RL is a 14 billion parameter Qwen3-based large language model developed by m-a-p, fine-tuned using Reinforcement Learning (RL) with the CriticLean_4K dataset. This model is specifically optimized for mathematical formalization and reasoning tasks, leveraging a dataset designed for critic-guided reinforcement learning. It features a 32768 token context length, making it suitable for complex problem-solving in math and code domains.

Loading preview...

What is CriticLeanGPT-Qwen3-14B-RL?

m-a-p/CriticLeanGPT-Qwen3-14B-RL is a 14 billion parameter language model built upon the Qwen3 architecture. It has been fine-tuned using Reinforcement Learning (RL) with the CriticLean_4K dataset, which is a subset of the larger CriticLeanInstruct dataset suite. This RL approach aims to align the model for improved performance, particularly in areas requiring critical evaluation and mathematical reasoning.

Key Characteristics

  • Base Model: Qwen3, a powerful large language model.
  • Parameter Count: 14 billion parameters.
  • Context Length: Supports a substantial context window of 32768 tokens.
  • Training Methodology: Underwent Reinforcement Learning (RL) using the CriticLean_4K dataset, which is specifically designed for critic-guided learning.
  • Dataset Integration: The CriticLeanInstruct dataset, used for training, incorporates samples from OpenR1-Math-220k and OpenThoughts-114k-Code_decontaminated, indicating a focus on mathematical and coding capabilities.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

  • Mathematical Formalization: Excels in tasks related to mathematical reasoning and problem-solving due to its RL training on math-centric data.
  • Code-related Tasks: Benefits from the inclusion of code data in its training, making it capable for code generation or understanding.
  • Research in RL-based LLM Alignment: Demonstrates an effective application of critic-guided reinforcement learning for model alignment.