XuanYuan-FinX1-Preview: Financial Reasoning LLM
XuanYuan-FinX1-Preview, developed by Duxiaoman-DI, is a 70 billion parameter large language model designed for complex financial scenarios. It stands out as the first GPT-O1-like reasoning model in the financial domain, emphasizing enhanced logical reasoning through a novel Chain-of-Thought (CoT) with process-reward and reinforcement learning training paradigm.
Key Capabilities
- Transparent Reasoning: The model generates a complete thought process, from problem decomposition to final conclusions, before providing an answer. This includes detailed thinking steps, marked by "◆" for coarse-grained nodes.
- Financial Domain Optimization: Deeply optimized for financial analysis, decision-making, and data processing tasks, addressing the unique complexities of this sector.
- Advanced Training Paradigm: Utilizes a three-step technical approach:
- Stable CoT Generation: Constructs high-quality CoT/Answer data by first generating thought processes and then answers, focusing on coherence and long-context handling.
- Dual Reward Models (ORM & PRM): Employs both outcome-oriented (ORM) and process-level (PRM) reward models. PRM specifically addresses the evaluation of open-ended financial questions by scoring each thinking step.
- Reinforcement Learning Fine-tuning: Uses PPO algorithm guided by both PRM and ORM to refine reasoning, correcting errors in thinking paths and evaluating answers based on problem type.
- Long Context Support: Enhanced capabilities for processing long texts, crucial for detailed financial documents.
Good For
- Financial Analysis: Performing in-depth analysis of financial data and scenarios.
- Decision Support: Aiding in complex financial decision-making processes by providing transparent reasoning.
- Data Processing: Handling and interpreting financial data within a reasoning framework.
- Research & Development: Serving as a foundation for further exploration and optimization in financial AI applications, with continuous open-source updates planned.