Duxiaoman-DI/Llama3.1-XuanYuan-FinX1-Preview

TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Dec 27, 2024License:llama3.1Architecture:Transformer0.0K Cold

XuanYuan-FinX1-Preview is a 70 billion parameter large language model developed by Duxiaoman-DI, specifically optimized for financial analysis, decision-making, and data processing tasks. It is the first GPT-O1-like reasoning model in the financial domain, utilizing an innovative Chain-of-Thought (CoT) with process-reward and reinforcement learning training paradigm to significantly enhance logical reasoning. The model provides transparent thought processes, including problem decomposition and analysis, before generating final answers, supporting a 32768-token context length.

Loading preview...

XuanYuan-FinX1-Preview: Financial Reasoning LLM

XuanYuan-FinX1-Preview, developed by Duxiaoman-DI, is a 70 billion parameter large language model designed for complex financial scenarios. It stands out as the first GPT-O1-like reasoning model in the financial domain, emphasizing enhanced logical reasoning through a novel Chain-of-Thought (CoT) with process-reward and reinforcement learning training paradigm.

Key Capabilities

  • Transparent Reasoning: The model generates a complete thought process, from problem decomposition to final conclusions, before providing an answer. This includes detailed thinking steps, marked by "◆" for coarse-grained nodes.
  • Financial Domain Optimization: Deeply optimized for financial analysis, decision-making, and data processing tasks, addressing the unique complexities of this sector.
  • Advanced Training Paradigm: Utilizes a three-step technical approach:
    • Stable CoT Generation: Constructs high-quality CoT/Answer data by first generating thought processes and then answers, focusing on coherence and long-context handling.
    • Dual Reward Models (ORM & PRM): Employs both outcome-oriented (ORM) and process-level (PRM) reward models. PRM specifically addresses the evaluation of open-ended financial questions by scoring each thinking step.
    • Reinforcement Learning Fine-tuning: Uses PPO algorithm guided by both PRM and ORM to refine reasoning, correcting errors in thinking paths and evaluating answers based on problem type.
  • Long Context Support: Enhanced capabilities for processing long texts, crucial for detailed financial documents.

Good For

  • Financial Analysis: Performing in-depth analysis of financial data and scenarios.
  • Decision Support: Aiding in complex financial decision-making processes by providing transparent reasoning.
  • Data Processing: Handling and interpreting financial data within a reasoning framework.
  • Research & Development: Serving as a foundation for further exploration and optimization in financial AI applications, with continuous open-source updates planned.