DianJin/DianJin-R1-32B
DianJin/DianJin-R1-32B is a 32.8 billion parameter large language model developed by Alibaba Cloud, based on Qwen2.5-32B-Instruct. It is specifically fine-tuned for enhanced financial reasoning, utilizing a novel framework that combines reasoning-augmented supervision and reinforcement learning. The model excels at generating structured reasoning steps and accurate answers for complex financial scenarios, making it suitable for applications requiring robust financial analysis and compliance checks.
Loading preview...
DianJin-R1-32B: Enhanced Financial Reasoning LLM
DianJin-R1-32B, developed by Alibaba Cloud, is a 32.8 billion parameter model built upon Qwen2.5-32B-Instruct, specifically engineered to improve financial reasoning capabilities. It employs a unique two-step training paradigm: Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL).
Key Capabilities
- Financial Reasoning: Specialized in understanding and processing complex financial questions, generating step-by-step reasoning, and providing accurate answers.
- Structured Output: Trained to produce coherent reasoning paths (e.g.,
<think>...</think>) followed by final answers (<answer>...</answer>), facilitating interpretability. - Reinforcement Learning (RL) Enhancement: Utilizes Group Relative Policy Optimization (GRPO) with dual reward signals for output structure and answer accuracy, further refining reasoning quality.
- Diverse Training Data: Fine-tuned on DianJin-R1-Data, a high-quality dataset derived from CFLUE, FinQA, and a proprietary Chinese Compliance Check (CCC) corpus, covering various financial reasoning scenarios.
Good For
- Applications requiring robust financial analysis.
- Tasks involving financial compliance checks.
- Generating detailed reasoning for financial queries.
- Use cases where structured, accurate financial insights are critical.