pb09204048/CRISP-DeepSeek-R1-Distill-Llama-8B-v2

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jul 2, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

CRISP-DeepSeek-R1-Distill-Llama-8B-v2 is an 8 billion parameter language model developed by pb09204048, based on the DeepSeek-R1-Distill-Llama architecture with a 32768 token context length. It is trained using the CRISP (Compressed Reasoning via Iterative Self-Policy Distillation) method with a v2 conciseness teacher, designed to produce concise yet comprehensive reasoning outputs. This model excels at balancing reasoning accuracy with reduced token usage, particularly for complex problems requiring detailed case analysis.

Loading preview...

CRISP-DeepSeek-R1-Distill-Llama-8B-v2 Overview

This model is an 8 billion parameter variant of the DeepSeek-R1-Distill-Llama architecture, fine-tuned using the CRISP (Compressed Reasoning via Iterative Self-Policy Distillation) method. CRISP is a unique self-distillation technique where the model learns to generate concise reasoning by distilling its own behavior, conditioned on a conciseness instruction, back into itself. The v2 teacher prompt specifically incorporates a "difficulty-aware" caveat, instructing the model not to over-compress hard or multi-step problems, ensuring that critical details like case analysis and edge cases are preserved.

Key Capabilities & Features

  • Concise Reasoning: Optimized to produce shorter, more efficient reasoning paths without sacrificing accuracy.
  • Self-Policy Distillation: Utilizes an iterative self-distillation process, where the model acts as both teacher and student, refining its conciseness over time.
  • Difficulty-Aware Compression: The v2 training ensures that while outputs are concise, complex problems retain necessary detail and logical steps.
  • Performance Balance: Benchmarks show a significant reduction in token usage (up to 23.2% on MATH-500) while maintaining competitive accuracy across reasoning tasks like MATH, AIME, GPQA-Diamond, and MMLU.

When to Use This Model

This model is particularly well-suited for applications where efficient and concise reasoning is crucial, such as:

  • Automated Problem Solving: Generating clear, step-by-step solutions to mathematical or logical problems with reduced verbosity.
  • Code Generation & Explanation: Providing succinct explanations or optimized code snippets.
  • Knowledge Distillation: Creating more compact summaries or reasoning traces from larger models.
  • Resource-Constrained Environments: When token budgets or inference speed are important, this model offers a strong balance of performance and efficiency.