Name: POLARIS-Project/Polaris-7B-Preview API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: POLARIS-Project

POLARIS-7B-Preview: Advanced Reasoning through RL Post-Training

POLARIS-7B-Preview is a 7.6 billion parameter model developed by POLARIS-Project, distinguished by its innovative post-training method that applies reinforcement learning (RL) to significantly enhance advanced reasoning capabilities. This approach has shown to elevate the performance of base models, such as Qwen3-4B, on complex reasoning tasks.

Key Capabilities & Recipe Highlights

Reinforcement Learning Scaling: Utilizes a unique RL recipe to refine and scale reasoning abilities, pushing the boundaries of open-recipe models.
Data Difficulty Analysis: Employs pre-training analysis to map data difficulty, recommending a mirrored J-shaped distribution with a bias towards challenging problems for optimal training.
Diversity-Based Rollout: Leverages diversity among rollouts to dynamically adjust sampling temperature during RL training.
Inference-Time Length Extrapolation: Incorporates techniques for generating longer Chains of Thought (CoT) at inference, enabling a "train-short, generate-long" paradigm to mitigate computational burdens.
Enhanced Exploration Efficiency: Achieves improved exploration through multi-stage training, allowing the model to "think longer" from the outset.

Performance & Benchmarks

POLARIS-7B-Preview demonstrates strong performance across various mathematical and reasoning benchmarks, often outperforming other 7B-class models and even some commercial systems. For instance, it achieves 72.6 on AIME24 avg@32 and 89.0 on AMC23 avg@8, showcasing its proficiency in advanced problem-solving. The model's training and evaluation codebase is built upon Verl, and its reward function is derived from DeepScaleR.

Overview

POLARIS-7B-Preview: Advanced Reasoning through RL Post-Training

Key Capabilities & Recipe Highlights

Performance & Benchmarks

Full Model Card (README)