Dat1710/nexus-1.5b
Dat1710/nexus-1.5b is a 1.54 billion parameter mathematical reasoning model developed by Neuriton. Built on Qwen2.5-Math-1.5B-Instruct, it is fine-tuned using Length-Penalized Reward Optimization (LPRO), a novel reinforcement learning method that simultaneously improves accuracy and conciseness. The model achieves 80.2 on MATH-500 and 85.2 on GSM8K (CoT), outperforming its base model while reducing response length by 14%. It is primarily optimized for complex mathematical problem-solving in English and Chinese.
Loading preview...
Nexus-1.5B: Optimized for Mathematical Reasoning
Nexus-1.5B is a 1.54 billion parameter model developed by Neuriton, specifically designed for advanced mathematical reasoning. It is built upon the Qwen2.5-Math-1.5B-Instruct base model and fine-tuned using a novel reinforcement learning method called Length-Penalized Reward Optimization (LPRO). LPRO addresses common issues in standard GRPO by using asymmetric clipping, token-level normalization, and a length-penalized advantage, leading to more accurate and concise responses.
Key Capabilities
- Enhanced Mathematical Accuracy: Achieves 80.2 on MATH-500 and 85.2 on GSM8K (CoT), surpassing its base model by +4.4 points on MATH-500.
- Concise Reasoning: Reduces average response length by 14% compared to its base model, demonstrating improved efficiency without sacrificing accuracy.
- Robust Alignment: LPRO's unique approach prevents entropy collapse and length bias, promoting diverse and effective solution patterns.
- Tool-Integrated Reasoning (TIR): Supports integration with external tools, showing strong performance on benchmarks like MATH-500 (84.0) and Olympiad Bench (56.0) with TIR.
Good for
- Solving complex mathematical problems requiring step-by-step reasoning (Chain-of-Thought).
- Applications where both accuracy and conciseness of mathematical solutions are critical.
- Research and development in advanced reinforcement learning for language models.
- Use cases requiring tool integration for mathematical problem-solving, particularly in English and Chinese contexts.