continuum-ai/qwen2.5-coder-7b-compacted

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 8, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The continuum-ai/qwen2.5-coder-7b-compacted is a Qwen2.5-Coder-7B model, developed by continuum-ai, that has undergone 12% head pruning and KL-distillation compensation via LoRA. This process compacts the model while maintaining code generation performance, achieving a HumanEval score of 61.0, closely matching the base model's 62.2. It is specifically optimized for Python code completion tasks, offering a more efficient footprint for deployment on various devices.

Loading preview...

Overview

This model, continuum-ai/qwen2.5-coder-7b-compacted, is a version of the Qwen2.5-Coder-7B model that has been significantly compacted through a process involving 12% head pruning and subsequent KL-distillation compensation via LoRA. The primary goal of this methodology is to reduce model size and computational overhead while preserving performance, particularly in code generation tasks.

Key Capabilities & Performance

  • Code Generation: Achieves a HumanEval score of 61.0 and HumanEval+ score of 53.0, demonstrating performance very close to the unpruned base model (62.2 and 53.7 respectively). This indicates successful recovery within calibration tolerance after compaction.
  • Compaction: Features a 12% reduction in parameters due to activation-magnitude head pruning, making it more efficient for deployment.
  • Provenance: Utilizes the ForgeAlloy chain of custody for cryptographic provenance, ensuring verifiable claims and results.
  • Methodology Demonstration: Serves as a demonstration of a methodology for model compaction and performance recovery.

Use Cases & Limitations

  • Good For: English-language Python code completion, especially where a more compact model footprint is desired for deployment on devices like MacBooks or mobile phones (with expected quantized sizes down to ~2.5GB). The model ships as fp16.
  • Limitations: Currently a methodology demonstration, not necessarily a Pareto-optimal artifact for all production code workloads. Performance on other programming languages, code paradigms, or code-adjacent domains (SQL, regex, shell) has not been measured. It is text-only and does not include vision modality.