Name: Jackrong/Llama3.1-8B-Thinking-R1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Jackrong

Llama3.1-8B-Thinking-R1: A Deep Reasoning Model

Jackrong/Llama3.1-8B-Thinking-R1 is an 8 billion parameter model based on Llama-3.1-8B-Instruct, specifically engineered for complex reasoning tasks in logic, mathematics, and programming. Its core innovation lies in a sophisticated "Think-and-Answer" paradigm, where the model utilizes <think> tags for self-correction, logical decomposition, and multi-path exploration before generating a final response.

Key Training Methodology

The model undergoes a unique three-stage training pipeline:

Cold-start SFT: Initial fine-tuning on high-quality mathematical reasoning data to establish basic reasoning formats and the use of <think> tags.
GRPO Reinforcement Learning: Large-scale reinforcement training using Group Relative Policy Optimization, guided by Accuracy and Format Rewards to optimize thought processes and reduce redundancy.
Final CoT Distillation SFT: Instruction fine-tuning with high-quality Chain-of-Thought data distilled from ultra-large models like GPT-OSS-120B and Qwen3-235B, enhancing logical rigor and expressiveness, particularly in Chinese logic and multi-turn dialogues.

Notable Features & Capabilities

Reinforcement Learning: Employs the GRPO algorithm for autonomous learning of logical decomposition.
Multi-stage Distillation: Incorporates reasoning logic from 120B+ scale models, significantly boosting performance in complex contexts.
Long Context Support: Capable of handling complex, long-chain reasoning tasks with a context length of up to 65,536 tokens.
Efficient Fine-Tuning: Built on the Unsloth framework using LoRA to maintain reasoning capabilities while preventing catastrophic forgetting.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

Solving intricate mathematical problems.
Executing complex logical deductions.
Handling multi-turn dialogue scenarios that demand deep reasoning.
Tasks benefiting from structured, self-correcting thought processes.

Overview

Llama3.1-8B-Thinking-R1: A Deep Reasoning Model

Key Training Methodology

Notable Features & Capabilities

Ideal Use Cases

Full Model Card (README)