CVE Backport Code Generation — Qwen2.5-Coder-32B (legacy)
This model, developed by anicka, is a fine-tuned version of Qwen2.5-Coder-32B-Instruct specifically designed for security patch backporting. Unlike traditional methods that generate unified diffs, this model takes a vulnerable code region and a fix description, then outputs the fixed version of the code. A programmatic diff is subsequently used to produce the final patch, which plays to the strengths of LLMs in code completion and avoids format-sensitivity issues.
Key Capabilities & Features
- Specialized Code Generation: Focuses on generating fixed code for security vulnerabilities, not diffs.
- High Accuracy: The latest v3 model achieves an average recall of 94% and an average precision of 98% on held-out test cases.
- Robust Performance: Demonstrates strong performance across different patch types, with 95% recall and 98% precision for 'Identical' patches, and 89% recall and 97% precision for 'Adapted' patches.
- Optimized Training: The v3 model was trained on 35,667 cleaned examples from the anicka/cve-backport-codegen-dataset, covering over 2,300 CVEs.
- Integration: Designed to be used with the cve-backport-tool CLI for a full automated pipeline.
Intended Use
This model is primarily intended as a research tool to assist with security patch backporting in Linux distribution maintenance. All generated patches require review by a maintainer before application. It is particularly useful for scenarios where precise code modification based on a fix description is needed, rather than generating a raw diff.