CRISP-DeepSeek-R1-Distill-Llama-8B-v2 Overview

This model is an 8 billion parameter variant of the DeepSeek-R1-Distill-Llama architecture, fine-tuned using the CRISP (Compressed Reasoning via Iterative Self-Policy Distillation) method. CRISP is a unique self-distillation technique where the model learns to generate concise reasoning by distilling its own behavior, conditioned on a conciseness instruction, back into itself. The v2 teacher prompt specifically incorporates a "difficulty-aware" caveat, instructing the model not to over-compress hard or multi-step problems, ensuring that critical details like case analysis and edge cases are preserved.

Key Capabilities & Features

Concise Reasoning: Optimized to produce shorter, more efficient reasoning paths without sacrificing accuracy.
Self-Policy Distillation: Utilizes an iterative self-distillation process, where the model acts as both teacher and student, refining its conciseness over time.
Difficulty-Aware Compression: The v2 training ensures that while outputs are concise, complex problems retain necessary detail and logical steps.
Performance Balance: Benchmarks show a significant reduction in token usage (up to 23.2% on MATH-500) while maintaining competitive accuracy across reasoning tasks like MATH, AIME, GPQA-Diamond, and MMLU.

When to Use This Model

This model is particularly well-suited for applications where efficient and concise reasoning is crucial, such as:

Automated Problem Solving: Generating clear, step-by-step solutions to mathematical or logical problems with reduced verbosity.
Code Generation & Explanation: Providing succinct explanations or optimized code snippets.
Knowledge Distillation: Creating more compact summaries or reasoning traces from larger models.
Resource-Constrained Environments: When token budgets or inference speed are important, this model offers a strong balance of performance and efficiency.

Overview

CRISP-DeepSeek-R1-Distill-Llama-8B-v2 Overview

Key Capabilities & Features

When to Use This Model

Full Model Card (README)