AsphaltProAT/deepseek_r1_distilled_qwen_7B_sparse_50
The AsphaltProAT/deepseek_r1_distilled_qwen_7B_sparse_50 model is a 7.6 billion parameter language model, distilled from DeepSeek-R1-Distill-Qwen-7B and pruned to 42.95% unstructured sparsity. Developed by AsphaltProAT, this model demonstrates that multi-step reasoning quality can be preserved even after significant weight pruning. It is primarily a proof of concept for PE-MoE architectures, showcasing the viability of sparse models for reasoning tasks, particularly in solving word problems with step-by-step explanations.
Loading preview...
Model Overview
AsphaltProAT/deepseek_r1_distilled_qwen_7B_sparse_50 is a 7.6 billion parameter model derived from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B. This model serves as a proof of concept for PE-MoE architectures, specifically demonstrating the preservation of reasoning quality after significant unstructured pruning.
Key Characteristics
- Base Model: DeepSeek-R1-Distill-Qwen-7B.
- Sparsity Method: Unstructured pruning using SparseGPT, targeting 50% sparsity.
- Achieved Sparsity: 42.95% actual sparsity, with weights pruned based on calibration data from GSM8K math problems (128 samples).
- Reasoning Preservation: The model retains its ability to perform multi-step reasoning, successfully solving word problems and providing step-by-step explanations.
- Hardware: Developed using a Kaggle T4 GPU.
Limitations and Considerations
- Unstructured Sparsity: Requires sparse-aware inference engines to fully realize memory and computational benefits.
- Calibration Data: Calibration was performed on general math problems, not domain-specific data.
- Quantization: The model is not yet quantized; an AWQ step has not been applied.
- Sparsity Variation: The achieved sparsity of 42.95% differs slightly from the 50% target due to layer-wise variations during pruning.
- Evaluation Scope: Quality was tested primarily on simple math problems, not comprehensive benchmarks, indicating a focused proof-of-concept evaluation.