PrimeIntellect/INTELLECT-2

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:May 11, 2025License:apache-2.0Architecture:Transformer0.2K Open Weights Warm

INTELLECT-2 is a 32 billion parameter language model developed by PrimeIntellect, trained using a distributed asynchronous reinforcement learning framework. Based on the Qwen2 architecture, it specializes in mathematical and coding tasks, demonstrating improved performance over its base model, QwQ-32B, in these domains. The model was trained leveraging community-contributed GPU resources and features a 131072 token context length.

Loading preview...

INTELLECT-2: A Reasoning Model for Math and Code

INTELLECT-2 is a 32 billion parameter language model from PrimeIntellect, distinguished by its training methodology: a distributed asynchronous reinforcement learning (RL) run utilizing globally contributed GPU resources. Built upon the qwen2 architecture, this model is specifically optimized for complex mathematical and coding challenges.

Key Capabilities & Features

  • Reinforcement Learning Training: Trained with prime-rl, a framework for distributed asynchronous RL using GRPO over verifiable rewards, enhancing stability.
  • Specialized Performance: Demonstrates improved scores over its base model, QwQ-32B, in mathematical benchmarks (AIME24, AIME25) and coding tasks (LiveCodeBench v5).
  • Context Length: Features a substantial 131072 token context length.
  • Length Control: Optimized for best results when prompted with "Think for 10000 tokens before giving a response," though other specific lengths (2000, 4000, 6000, 8000) also yield strong performance.
  • Compatibility: Compatible with popular inference engines like vllm and sglang.

Use Cases & Considerations

INTELLECT-2 is particularly well-suited for applications requiring strong reasoning in mathematics and code generation/analysis. While excelling in these areas, its performance on general instruction following (IFEval) saw a slight decrease compared to its base model, suggesting a focused specialization. Developers should leverage its length control prompting for optimal output quality in its target domains. For more technical details, refer to the technical report.