Name: continuum-ai/qwen2.5-coder-7b-compacted API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: continuum-ai

Overview

This model, continuum-ai/qwen2.5-coder-7b-compacted, is a version of the Qwen2.5-Coder-7B model that has been significantly compacted through a process involving 12% head pruning and subsequent KL-distillation compensation via LoRA. The primary goal of this methodology is to reduce model size and computational overhead while preserving performance, particularly in code generation tasks.

Key Capabilities & Performance

Code Generation: Achieves a HumanEval score of 61.0 and HumanEval+ score of 53.0, demonstrating performance very close to the unpruned base model (62.2 and 53.7 respectively). This indicates successful recovery within calibration tolerance after compaction.
Compaction: Features a 12% reduction in parameters due to activation-magnitude head pruning, making it more efficient for deployment.
Provenance: Utilizes the ForgeAlloy chain of custody for cryptographic provenance, ensuring verifiable claims and results.
Methodology Demonstration: Serves as a demonstration of a methodology for model compaction and performance recovery.

Use Cases & Limitations

Good For: English-language Python code completion, especially where a more compact model footprint is desired for deployment on devices like MacBooks or mobile phones (with expected quantized sizes down to ~2.5GB). The model ships as fp16.
Limitations: Currently a methodology demonstration, not necessarily a Pareto-optimal artifact for all production code workloads. Performance on other programming languages, code paradigms, or code-adjacent domains (SQL, regex, shell) has not been measured. It is text-only and does not include vision modality.

Overview

Overview

Key Capabilities & Performance

Use Cases & Limitations

Full Model Card (README)