OpenVul-Qwen3-4B-GRPO: Specialized Vulnerability Detection LLM

OpenVul-Qwen3-4B-GRPO is a 4 billion parameter language model, post-trained from OpenVul-Qwen3-4B-SFT-ep3. It is designed as a specialized vulnerability detection reasoning LLM, leveraging on-policy reinforcement learning to navigate complex vulnerability analysis paths in C/C++ code. The model has a context length of 32,768 tokens.

Key Capabilities

Context-Level Vulnerability Detection: Focuses on analyzing vulnerabilities within the broader code context, including inter-procedural elements such as global variables, type definitions, and callee functions, rather than just isolated functions.
CWE Standard Adherence: Identifies security flaws in C/C++ code with a focus on Common Weakness Enumeration (CWE) standards.
Evidence-Based Analysis: Provides precise, evidence-based analysis without speculation, clearly labeling detected vulnerabilities.
Reinforcement Learning: Utilizes on-policy reinforcement learning for enhanced reasoning in vulnerability detection.

Recommended Use Cases

Automated Code Security Audits: Ideal for integrating into pipelines for automated scanning of C/C++ codebases to identify potential security vulnerabilities.
Developer Tooling: Can assist developers in identifying and understanding security flaws during the development process.
Security Research: Useful for researchers studying and developing methods for automated vulnerability detection in C/C++.

For optimal inference, it is recommended to use vLLM with specific parameters: enable_thinking=True, n=8, repetition_penalty=1.0, temperature=0.6, top_p=0.95, top_k=20, min_p=0, and max_tokens=32768.

Overview

OpenVul-Qwen3-4B-GRPO: Specialized Vulnerability Detection LLM

Key Capabilities

Recommended Use Cases

Full Model Card (README)