Rho-1: Selective Language Modeling for Math
microsoft/rho-math-7b-v0.1 is a 7 billion parameter model from Microsoft, distinguished by its novel Selective Language Modeling (SLM) pretraining approach. Unlike traditional methods that train on all tokens, SLM focuses on "clean and useful tokens" aligned with the desired distribution, leading to more efficient learning.
Key Capabilities
- Efficient Mathematical Reasoning: Achieves competitive performance on math benchmarks using substantially fewer pretraining tokens. For instance, Rho-Math-7B matches DeepSeekMath's performance on the MATH dataset with only 3% of the pretraining tokens.
- Strong Benchmark Performance:
- MATH Dataset: 31.0% few-shot accuracy.
- GSM8K: 66.9% few-shot accuracy.
- SAT: 84.4% few-shot accuracy, matching DeepSeekMath-7B.
- Base Model for Math-Focused Applications: Serves as a robust foundation for tasks requiring numerical and logical problem-solving.
What Makes This Model Different?
The core differentiator is Selective Language Modeling (SLM). This technique involves three steps:
- Training a reference model on high-quality data.
- Scoring each token's loss in a corpus using the reference model.
- Selectively training the language model on tokens exhibiting higher excess loss compared to the reference. This process allows Rho-1 models to achieve significant performance gains (e.g., 5-10x faster to baseline performance on GSM8k and MATH) by concentrating computational effort on the most informative tokens.
Good for
- Applications requiring efficient and accurate mathematical problem-solving.
- Use cases where computational efficiency during pretraining is critical.
- As a base model for further fine-tuning on specialized mathematical or reasoning tasks.