Nexus-1.5B: Optimized for Mathematical Reasoning

Nexus-1.5B is a 1.54 billion parameter model developed by Neuriton, specifically designed for advanced mathematical reasoning. It is built upon the Qwen2.5-Math-1.5B-Instruct base model and fine-tuned using a novel reinforcement learning method called Length-Penalized Reward Optimization (LPRO). LPRO addresses common issues in standard GRPO by using asymmetric clipping, token-level normalization, and a length-penalized advantage, leading to more accurate and concise responses.

Key Capabilities

Enhanced Mathematical Accuracy: Achieves 80.2 on MATH-500 and 85.2 on GSM8K (CoT), surpassing its base model by +4.4 points on MATH-500.
Concise Reasoning: Reduces average response length by 14% compared to its base model, demonstrating improved efficiency without sacrificing accuracy.
Robust Alignment: LPRO's unique approach prevents entropy collapse and length bias, promoting diverse and effective solution patterns.
Tool-Integrated Reasoning (TIR): Supports integration with external tools, showing strong performance on benchmarks like MATH-500 (84.0) and Olympiad Bench (56.0) with TIR.

Good for

Solving complex mathematical problems requiring step-by-step reasoning (Chain-of-Thought).
Applications where both accuracy and conciseness of mathematical solutions are critical.
Research and development in advanced reinforcement learning for language models.
Use cases requiring tool integration for mathematical problem-solving, particularly in English and Chinese contexts.

Overview

Nexus-1.5B: Optimized for Mathematical Reasoning

Key Capabilities

Good for

Full Model Card (README)