Jackrong/gpt-oss-120b-Distill-Llama3.1-8B-v1

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Oct 3, 2025License:llama3.1Architecture:Transformer Cold

Jackrong/gpt-oss-120b-Distill-Llama3.1-8B-v1 is an 8 billion parameter language model developed by Soren, based on Meta-Llama-3.1-8B. This model is specifically fine-tuned through a two-stage process involving knowledge distillation and reinforcement learning to enhance its logical and mathematical reasoning capabilities. It excels at generating detailed, structured chains of thought for complex problem-solving, particularly in mathematical domains, and supports both English and Chinese.

Loading preview...

Model Overview

Jackrong/gpt-oss-120b-Distill-Llama3.1-8B-v1 is an 8 billion parameter model developed by Soren, built upon the Meta-Llama-3.1-8B base. Its core innovation lies in a two-stage training process designed to inject advanced reasoning abilities, particularly for mathematical problems. The model was initially fine-tuned (SFT) using approximately 420 million tokens, distilling knowledge and Chain-of-Thought (CoT) reasoning styles from powerful teacher models like gpt-oss-120b-high and Qwen3-235B.

Key Capabilities

  • Enhanced Reasoning: Specialized in logical and mathematical reasoning, trained to generate detailed, structured thought processes (enclosed in <think>...</think> tags) before providing solutions.
  • Knowledge Distillation: Absorbs high-quality reasoning data from larger, more capable teacher models across various domains (STEM, economics, social sciences).
  • Reinforcement Learning (GRPO): Utilizes Group Relative Policy Optimization to autonomously explore and optimize reasoning strategies, moving beyond simple imitation.
  • Multilingual Support: Incorporates both English and Chinese reasoning data, enhancing its capabilities in both languages.
  • Self-Reflection: Demonstrates a tendency for self-reflection and correction within its reasoning chains, indicating an internal standard for logical judgment.

Limitations

  • Resource Constraints: Performance may not match top-tier specialized models due to limited training resources.
  • Result-Oriented Bias: The strong focus on correctness in mathematical problems during RL might lead to overly concise responses in general conversations.
  • Language Mixing: May occasionally mix Chinese and English in its output due to mixed training data.
  • Imbalanced Capabilities: Strong in algebra word problems but potentially weaker in other specialized fields or general chat.
  • No External Tool Use: Lacks the ability to call external tools like calculators or search engines, limiting its precision for complex problems requiring external knowledge.