Model Overview
Jackrong/gpt-oss-120b-Distill-Llama3.1-8B-v1 is an 8 billion parameter model developed by Soren, built upon the Meta-Llama-3.1-8B base. Its core innovation lies in a two-stage training process designed to inject advanced reasoning abilities, particularly for mathematical problems. The model was initially fine-tuned (SFT) using approximately 420 million tokens, distilling knowledge and Chain-of-Thought (CoT) reasoning styles from powerful teacher models like gpt-oss-120b-high and Qwen3-235B.
Key Capabilities
- Enhanced Reasoning: Specialized in logical and mathematical reasoning, trained to generate detailed, structured thought processes (enclosed in
<think>...</think> tags) before providing solutions. - Knowledge Distillation: Absorbs high-quality reasoning data from larger, more capable teacher models across various domains (STEM, economics, social sciences).
- Reinforcement Learning (GRPO): Utilizes Group Relative Policy Optimization to autonomously explore and optimize reasoning strategies, moving beyond simple imitation.
- Multilingual Support: Incorporates both English and Chinese reasoning data, enhancing its capabilities in both languages.
- Self-Reflection: Demonstrates a tendency for self-reflection and correction within its reasoning chains, indicating an internal standard for logical judgment.
Limitations
- Resource Constraints: Performance may not match top-tier specialized models due to limited training resources.
- Result-Oriented Bias: The strong focus on correctness in mathematical problems during RL might lead to overly concise responses in general conversations.
- Language Mixing: May occasionally mix Chinese and English in its output due to mixed training data.
- Imbalanced Capabilities: Strong in algebra word problems but potentially weaker in other specialized fields or general chat.
- No External Tool Use: Lacks the ability to call external tools like calculators or search engines, limiting its precision for complex problems requiring external knowledge.