Intelligent-Internet/II-Thought-1.5B-Preview

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 24, 2025Architecture:Transformer0.0K Warm

II-Thought-1.5B-Preview is a 1.5 billion parameter Reinforcement Learning enhanced language model developed by Intelligent-Internet. It is trained on a 50K math subset of the II-Thought-RL-v0 dataset, optimized for mathematical reasoning and problem-solving. This model demonstrates strong performance across various math benchmarks, often outperforming its base model and other 1.5B math-focused models.

Loading preview...

Overview

II-Thought-1.5B-Preview is a 1.5 billion parameter language model developed by Intelligent-Internet, enhanced through Reinforcement Learning (RL). It is built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model and trained using the GRPO algorithm within the ii_thought framework.

Key Capabilities & Training

This preview model was specifically trained on a 50K math subset of the large-scale, multi-task II-Thought-RL-v0 dataset. Its training incorporates a sophisticated reward modeling system that considers both answer correctness and format correctness, aiming to improve the quality and structure of generated responses. The model's maximum context length is 32,768 tokens.

Performance Highlights

II-Thought-1.5B-Preview shows significant improvements in mathematical reasoning benchmarks compared to its base model and Qwen2.5-Math-1.5B-Instruct. It achieves an average score of 49.90% across various math and reasoning tasks, including AMC23 (79.77%), AIME24 (34.17%), Olympiad Bench (52.78%), and Math500 (87.2%). It also demonstrates improved performance on LiveCodeBench (19.84%) and IFEval (44.84%).

Usage Guidelines

For optimal performance, users are recommended to set sampling parameters such as temperature = 0.6 and top_p = 0.95. When tackling mathematical problems, it is advised to explicitly request step-by-step reasoning and to format the final answer within \boxed{}.