Jackrong/gpt-oss-120b-Distill-Llama3.1-8B-v3

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Nov 22, 2025License:llama3.1Architecture:Transformer Cold

Jackrong/gpt-oss-120b-Distill-Llama3.1-8B-v3 is an 8 billion parameter language model developed by Soren, based on Meta-Llama-3.1-8B with a 32768 token context length. It is specifically fine-tuned through a two-stage process of supervised fine-tuning (SFT) and reinforcement learning (GRPO) to distill advanced reasoning capabilities, particularly excelling in mathematical problem-solving and generating detailed Chain-of-Thought (CoT) explanations. This model is optimized for complex logical reasoning tasks in both English and Chinese.

Loading preview...

Overview

Jackrong/gpt-oss-120b-Distill-Llama3.1-8B-v3 is an 8 billion parameter model developed by Soren, built upon the meta/Meta-Llama-3.1-8B base. This model is engineered to inject powerful reasoning capabilities, particularly for mathematical problem-solving, through an innovative two-stage training process. It supports both English and Chinese languages.

Key Capabilities

  • Advanced Reasoning: Distills high-quality knowledge and explicit Chain-of-Thought (CoT) reasoning styles from larger "teacher models" like gpt-oss-120b-high and Qwen3-235B.
  • Structured Output: Trained to generate detailed thought processes within <think>...</think> tags before providing solutions, enhancing interpretability.
  • Mathematical Problem-Solving: Utilizes Group Relative Policy Optimization (GRPO) in its second training stage to autonomously explore and optimize reasoning strategies for mathematics.
  • Self-Reflection: Demonstrates a tendency for self-reflection and correction within its reasoning chains, dynamically adjusting and refining its logical process.

Good for

  • Complex Logical Reasoning: Ideal for tasks requiring structured, multi-step reasoning, especially in STEM fields.
  • Mathematical Applications: Excels in solving algebra word problems and other mathematical challenges.
  • Interpretable AI: Provides detailed thought processes, making its reasoning more transparent and understandable.
  • Bilingual Reasoning: Capable of handling reasoning tasks in both English and Chinese, leveraging mixed-language training data.

Limitations

  • Resource Constraints: Performance may not match top-tier specialized reasoning models due to limited training steps and data compared to official models.
  • Result-Oriented Bias: The reinforcement learning stage's focus on final answer correctness might lead to overly concise responses for general, non-reasoning questions.
  • Language Mixing: May occasionally mix Chinese and English in its generated thought processes or answers due to mixed SFT data.
  • No External Tool Use: Lacks the ability to call external tools like calculators or search engines, limiting its capacity for problems requiring precise calculations or real-time external knowledge.