POLARIS-Project/Polaris-7B-Preview

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jun 12, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

POLARIS-Project/Polaris-7B-Preview is a 7.6 billion parameter language model developed by POLARIS-Project, enhanced through a post-training reinforcement learning (RL) recipe. This model specializes in advanced reasoning tasks, demonstrating significant improvements on challenging benchmarks. It leverages open-source data and academic resources to achieve high performance, even surpassing some commercial systems in specific reasoning evaluations. The model is built upon base models like Qwen3-4B and DeepSeek-R1-Distill-Qwen-7B, optimized for complex problem-solving.

Loading preview...

POLARIS-7B-Preview: Advanced Reasoning through RL Post-Training

POLARIS-7B-Preview is a 7.6 billion parameter model developed by POLARIS-Project, distinguished by its innovative post-training method that applies reinforcement learning (RL) to significantly enhance advanced reasoning capabilities. This approach has shown to elevate the performance of base models, such as Qwen3-4B, on complex reasoning tasks.

Key Capabilities & Recipe Highlights

  • Reinforcement Learning Scaling: Utilizes a unique RL recipe to refine and scale reasoning abilities, pushing the boundaries of open-recipe models.
  • Data Difficulty Analysis: Employs pre-training analysis to map data difficulty, recommending a mirrored J-shaped distribution with a bias towards challenging problems for optimal training.
  • Diversity-Based Rollout: Leverages diversity among rollouts to dynamically adjust sampling temperature during RL training.
  • Inference-Time Length Extrapolation: Incorporates techniques for generating longer Chains of Thought (CoT) at inference, enabling a "train-short, generate-long" paradigm to mitigate computational burdens.
  • Enhanced Exploration Efficiency: Achieves improved exploration through multi-stage training, allowing the model to "think longer" from the outset.

Performance & Benchmarks

POLARIS-7B-Preview demonstrates strong performance across various mathematical and reasoning benchmarks, often outperforming other 7B-class models and even some commercial systems. For instance, it achieves 72.6 on AIME24 avg@32 and 89.0 on AMC23 avg@8, showcasing its proficiency in advanced problem-solving. The model's training and evaluation codebase is built upon Verl, and its reward function is derived from DeepScaleR.