anicka/cve-backport-codegen-qwen25-32b-v1
anicka/cve-backport-codegen-qwen25-32b-v1 is a fine-tuned Qwen2.5-Coder-32B-Instruct model developed by anicka, specifically optimized for security patch backporting. This model generates fixed code regions from vulnerable code and fix descriptions, rather than unified diffs, leveraging LLM strengths in code completion. It achieves high precision (98% for v3) and recall (94% for v3) on per-hunk evaluation, making it highly effective for assisting in Linux distribution security maintenance.
Loading preview...
CVE Backport Code Generation — Qwen2.5-Coder-32B (legacy)
This model, developed by anicka, is a fine-tuned version of Qwen2.5-Coder-32B-Instruct specifically designed for security patch backporting. Unlike traditional methods that generate unified diffs, this model takes a vulnerable code region and a fix description, then outputs the fixed version of the code. A programmatic diff is subsequently used to produce the final patch, which plays to the strengths of LLMs in code completion and avoids format-sensitivity issues.
Key Capabilities & Features
- Specialized Code Generation: Focuses on generating fixed code for security vulnerabilities, not diffs.
- High Accuracy: The latest v3 model achieves an average recall of 94% and an average precision of 98% on held-out test cases.
- Robust Performance: Demonstrates strong performance across different patch types, with 95% recall and 98% precision for 'Identical' patches, and 89% recall and 97% precision for 'Adapted' patches.
- Optimized Training: The v3 model was trained on 35,667 cleaned examples from the anicka/cve-backport-codegen-dataset, covering over 2,300 CVEs.
- Integration: Designed to be used with the cve-backport-tool CLI for a full automated pipeline.
Intended Use
This model is primarily intended as a research tool to assist with security patch backporting in Linux distribution maintenance. All generated patches require review by a maintainer before application. It is particularly useful for scenarios where precise code modification based on a fix description is needed, rather than generating a raw diff.