General-Reasoner-Qwen2.5-7B: Enhanced Reasoning Across Domains
General-Reasoner-Qwen2.5-7B is a 7.6 billion parameter model from TIGER-Lab, built upon the Qwen2.5-7B-Base architecture. Its core innovation lies in a training paradigm designed to significantly improve reasoning capabilities across a broad spectrum of subjects, moving beyond traditional focus areas like math and coding to include fields such as physics, chemistry, finance, and humanities.
Key Capabilities & Features
- Domain-Agnostic Reasoning: Robustly enhances reasoning across diverse academic and practical domains.
- Zero RL Training: Employs a direct reinforcement learning approach from base LLMs, bypassing intermediate supervised stages for efficiency.
- Diverse Verifiable Data: Trained on over 230,000 high-quality, verifiable reasoning questions collected from the web, ensuring broad applicability.
- Model-Based Verifier: Utilizes a compact 1.5 billion parameter generative verifier model for context-aware, chain-of-thought answer validation, offering a more effective alternative to rule-based methods.
Performance & Use Cases
This model demonstrates superior performance compared to base and supervised models on various reasoning benchmarks, indicating strong generalization. It is particularly well-suited for applications requiring advanced logical deduction and problem-solving across multiple disciplines, where verifiable and robust reasoning is critical.