Shimin/qwen3_vl_8b_foreagent
ForeAgent is a fine-tuned Qwen3-VL-8B model developed by Shimin, specifically designed for AI-generated image detection. This 8 billion parameter vision-language model analyzes images through multi-view forensic analysis, incorporating semantic, frequency-domain, and spatial-domain features. It excels at distinguishing real from fake (AI-generated) images, achieving 82.18% accuracy on the Chameleon benchmark, and outputs structured JSON with conclusion, confidence, and reasoning.
Loading preview...
Overview
ForeAgent (Forensics Agent) is a specialized 8 billion parameter vision-language model, fine-tuned from Qwen3-VL-8B by Shimin, for AI-generated image detection. It determines whether an image is authentic or AI-generated by performing a multi-view forensic analysis. The model processes both the original image and its frequency-domain representation (wavelet cD) for enhanced accuracy.
Key Capabilities
- High Accuracy: Achieves 82.18% accuracy on the Chameleon benchmark, outperforming AIDE by 16.41%.
- Multi-View Analysis: Integrates semantic features (texture, anatomy, consistency, artifacts), frequency-domain features (wavelet cD), and spatial-domain features (noise pattern residuals).
- Structured Output: Provides a JSON output including a conclusion ("real" or "fake"), a confidence score (0.0-1.0), and a brief reasoning.
- Iterative Self-Refinement: Trained using a Hindsight-Driven Self-Refining (EFA) pipeline involving iterative sampling, reflection, and evolution to improve reasoning quality and detection capabilities.
- Dual-Input Mode: Supports optional dual-image input (original + wavelet frequency domain) for best performance.
Good For
- AI-generated image detection and forensic analysis.
- Deepfake detection in content moderation workflows.
- Research into multimodal reasoning for image authenticity verification.
- Integration into agentic forensic systems requiring detailed image analysis.