Leopo1d/OpenVul-Qwen3-4B-GRPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:May 2, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

Leopo1d/OpenVul-Qwen3-4B-GRPO is a 4 billion parameter, 32K context length language model, post-trained from OpenVul-Qwen3-4B-SFT-ep3. It is a specialized vulnerability detection reasoning LLM, utilizing on-policy reinforcement learning to identify security flaws in C/C++ code. This model excels at context-level vulnerability detection, analyzing inter-procedural contexts like global variables and callee functions.

Loading preview...

OpenVul-Qwen3-4B-GRPO: Specialized Vulnerability Detection LLM

OpenVul-Qwen3-4B-GRPO is a 4 billion parameter language model, post-trained from OpenVul-Qwen3-4B-SFT-ep3. It is designed as a specialized vulnerability detection reasoning LLM, leveraging on-policy reinforcement learning to navigate complex vulnerability analysis paths in C/C++ code. The model has a context length of 32,768 tokens.

Key Capabilities

  • Context-Level Vulnerability Detection: Focuses on analyzing vulnerabilities within the broader code context, including inter-procedural elements such as global variables, type definitions, and callee functions, rather than just isolated functions.
  • CWE Standard Adherence: Identifies security flaws in C/C++ code with a focus on Common Weakness Enumeration (CWE) standards.
  • Evidence-Based Analysis: Provides precise, evidence-based analysis without speculation, clearly labeling detected vulnerabilities.
  • Reinforcement Learning: Utilizes on-policy reinforcement learning for enhanced reasoning in vulnerability detection.

Recommended Use Cases

  • Automated Code Security Audits: Ideal for integrating into pipelines for automated scanning of C/C++ codebases to identify potential security vulnerabilities.
  • Developer Tooling: Can assist developers in identifying and understanding security flaws during the development process.
  • Security Research: Useful for researchers studying and developing methods for automated vulnerability detection in C/C++.

For optimal inference, it is recommended to use vLLM with specific parameters: enable_thinking=True, n=8, repetition_penalty=1.0, temperature=0.6, top_p=0.95, top_k=20, min_p=0, and max_tokens=32768.