DianJin/DianJin-R1-7B
DianJin/DianJin-R1-7B is a 7.6 billion parameter language model developed by DianJin, based on the Qwen2.5-7B-Instruct architecture, with a context length of 32768 tokens. It is specifically fine-tuned for financial reasoning tasks, utilizing a novel framework that combines reasoning-augmented supervision and reinforcement learning. The model excels at generating structured reasoning steps and accurate answers for diverse financial scenarios, including compliance checks.
Loading preview...
DianJin-R1-7B: Financial Reasoning LLM
DianJin-R1-7B is a 7.6 billion parameter language model built upon the Qwen2.5-7B-Instruct architecture, specifically designed to enhance financial reasoning capabilities. Developed by DianJin, this model employs a unique two-step training paradigm involving Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL).
Key Capabilities
- Enhanced Financial Reasoning: Utilizes DianJin-R1-Data, a high-quality dataset derived from CFLUE, FinQA, and a proprietary Chinese Compliance Check (CCC) corpus, to cover diverse financial reasoning scenarios.
- Structured Reasoning: Trained with SFT to generate explicit chain-of-thought (CoT) reasoning steps formatted as
<think>...</think>before providing a final answer<answer>...</answer>. - Reinforcement Learning Optimization: Employs Group Relative Policy Optimization (GRPO) with dual reward signals—a format reward for structural adherence and an accuracy reward for correct answers—to further refine reasoning quality.
Good For
- Applications requiring precise financial analysis and problem-solving.
- Tasks involving compliance checks and complex financial queries.
- Scenarios where transparent, step-by-step reasoning is crucial alongside accurate answers.