Leopo1d/OpenVul-Qwen3-4B-GRPO
Leopo1d/OpenVul-Qwen3-4B-GRPO is a 4 billion parameter, 32K context length language model, post-trained from OpenVul-Qwen3-4B-SFT-ep3. It is a specialized vulnerability detection reasoning LLM, utilizing on-policy reinforcement learning to identify security flaws in C/C++ code. This model excels at context-level vulnerability detection, analyzing inter-procedural contexts like global variables and callee functions.
Loading preview...
OpenVul-Qwen3-4B-GRPO: Specialized Vulnerability Detection LLM
OpenVul-Qwen3-4B-GRPO is a 4 billion parameter language model, post-trained from OpenVul-Qwen3-4B-SFT-ep3. It is designed as a specialized vulnerability detection reasoning LLM, leveraging on-policy reinforcement learning to navigate complex vulnerability analysis paths in C/C++ code. The model has a context length of 32,768 tokens.
Key Capabilities
- Context-Level Vulnerability Detection: Focuses on analyzing vulnerabilities within the broader code context, including inter-procedural elements such as global variables, type definitions, and callee functions, rather than just isolated functions.
- CWE Standard Adherence: Identifies security flaws in C/C++ code with a focus on Common Weakness Enumeration (CWE) standards.
- Evidence-Based Analysis: Provides precise, evidence-based analysis without speculation, clearly labeling detected vulnerabilities.
- Reinforcement Learning: Utilizes on-policy reinforcement learning for enhanced reasoning in vulnerability detection.
Recommended Use Cases
- Automated Code Security Audits: Ideal for integrating into pipelines for automated scanning of C/C++ codebases to identify potential security vulnerabilities.
- Developer Tooling: Can assist developers in identifying and understanding security flaws during the development process.
- Security Research: Useful for researchers studying and developing methods for automated vulnerability detection in C/C++.
For optimal inference, it is recommended to use vLLM with specific parameters: enable_thinking=True, n=8, repetition_penalty=1.0, temperature=0.6, top_p=0.95, top_k=20, min_p=0, and max_tokens=32768.