LegalOne-1.7B: Specialized for Chinese Legal Reasoning
LegalOne-1.7B is part of the LegalOne family of foundation models, developed by CSHaitao, specifically designed for the Chinese legal domain. Built on the Qwen3-1.7B-Base architecture, this 1.7 billion parameter model addresses the need for reliable AI systems in legal contexts, where general LLMs often fall short due to the knowledge-intensive and structure-dense nature of legal reasoning.
Key Capabilities & Training Methodology
This model is trained using a sophisticated multi-stage framework to jointly enhance legal knowledge and reasoning abilities:
- Mid-term Training: Utilizes Plasticity-Adjusted Sampling (PAS) for data scheduling, smoothly transitioning from broad general data to specialized legal tasks. This approach effectively injects legal knowledge while mitigating catastrophic forgetting.
- Supervised Fine-tuning: Employs Legal Agentic CoT Distillation (LEAD), a system that simulates professional legal workflows to generate large-scale, high-consistency reasoning trajectories. This cultivates the model's ability to perform reliable legal reasoning.
- Reinforcement Learning: Incorporates multi-stage curriculum learning, progressively shaping reasoning capabilities from simple to complex tasks, fostering an internalized and autonomous "legal thinking" pattern.
Performance and Use Cases
While LegalOne-8B is highlighted for its superior performance, the 1.7B version offers a lightweight deployment option within the LegalOne series. The family of models generally excels in tasks such as:
- Legal knowledge understanding and memorization of legal provisions.
- Case law reasoning and multi-hop inference.
- Legal question answering and document drafting.
LegalOne models are trained on approximately 100 billion tokens of mixed corpus, including general, legal, and synthetic data, with a focus on recent and valid legal documents. The model's output format follows a "think first, then answer" structure, providing a thought process before the final response. All evaluation results are based on the LegalKit evaluation toolkit, ensuring reproducibility and transparency.