continuum-ai/qwen2.5-coder-7b-compacted

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 8, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The continuum-ai/qwen2.5-coder-7b-compacted is a 7.6 billion parameter causal language model from the Qwen2.5-Coder family, developed by continuum-ai. This model has undergone 12% head pruning and KL-distillation compensation via LoRA, resulting in a compact version that maintains strong performance on coding tasks. It is specifically optimized for English-language Python code completion, achieving a HumanEval score of 61.0 with a 32768 token context length.

Loading preview...

Overview

The continuum-ai/qwen2.5-coder-7b-compacted is a 7.6 billion parameter model derived from the Qwen2.5-Coder-7B base, developed by continuum-ai. Its primary distinction lies in its compaction through 12% head pruning (activation-magnitude) and subsequent KL-distillation compensation using LoRA. This process aims to reduce model size while recovering performance to within calibration tolerance of the unmodified base model.

Key Capabilities & Performance

  • Code Generation: Optimized for English-language Python code completion.
  • Benchmarking: Achieves a HumanEval score of 61.0 (compared to the base's 62.2) and a HumanEval+ score of 53.0 (base 53.7), demonstrating effective performance recovery post-pruning.
  • Provenance: Features cryptographic provenance via the ForgeAlloy chain of custody, ensuring verifiable claims and results.
  • Compaction Methodology: Utilizes a prune → lora → lora → eval pipeline, with full details available in the methodology paper.

Limitations & Use Cases

This model serves as a methodology demonstration for model compaction rather than a production-ready, Pareto-optimal artifact for all hardware tiers. It is validated specifically for English-language Python code completion and has not been measured for other programming languages, code paradigms, or code-adjacent domains. It is currently text-only, with vision modality not yet integrated. For production code workloads on smaller hardware, the unmodified Qwen2.5-Coder-7B at standard quantization might be a better fit, pending future larger Qwen3.5+ forges that fully leverage the pruning dimension.