DianJin/DianJin-R1-7B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 21, 2025License:mitArchitecture:Transformer0.0K Open Weights Warm

DianJin/DianJin-R1-7B is a 7.6 billion parameter language model developed by DianJin, based on the Qwen2.5-7B-Instruct architecture, with a context length of 32768 tokens. It is specifically fine-tuned for financial reasoning tasks, utilizing a novel framework that combines reasoning-augmented supervision and reinforcement learning. The model excels at generating structured reasoning steps and accurate answers for diverse financial scenarios, including compliance checks.

Loading preview...

DianJin-R1-7B: Financial Reasoning LLM

DianJin-R1-7B is a 7.6 billion parameter language model built upon the Qwen2.5-7B-Instruct architecture, specifically designed to enhance financial reasoning capabilities. Developed by DianJin, this model employs a unique two-step training paradigm involving Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL).

Key Capabilities

  • Enhanced Financial Reasoning: Utilizes DianJin-R1-Data, a high-quality dataset derived from CFLUE, FinQA, and a proprietary Chinese Compliance Check (CCC) corpus, to cover diverse financial reasoning scenarios.
  • Structured Reasoning: Trained with SFT to generate explicit chain-of-thought (CoT) reasoning steps formatted as <think>...</think> before providing a final answer <answer>...</answer>.
  • Reinforcement Learning Optimization: Employs Group Relative Policy Optimization (GRPO) with dual reward signals—a format reward for structural adherence and an accuracy reward for correct answers—to further refine reasoning quality.

Good For

  • Applications requiring precise financial analysis and problem-solving.
  • Tasks involving compliance checks and complex financial queries.
  • Scenarios where transparent, step-by-step reasoning is crucial alongside accurate answers.