InfiX-ai/InfiR-1B-Instruct is a 1 billion parameter instruction-tuned causal language model developed by InfiX, continually pretrained from Llama-3.2-1B. It features a 32768 token context length and is specifically optimized for reasoning tasks, including mathematical problem-solving, code generation, and chain-of-thought reasoning. This model aims to improve reasoning capabilities in smaller language models, offering a balance between performance and reduced computational overhead.
Loading preview...
InfiR-1B-Instruct: A Reasoning-Focused Small Language Model
InfiR-1B-Instruct, developed by InfiX, is a 1 billion parameter instruction-tuned model built upon the Llama-3.2-1B architecture. It is designed to enhance reasoning capabilities in smaller language models, aiming to reduce adoption barriers and address privacy concerns associated with larger models. The model features an extended sequence length of 4096 tokens during training and supports a context length of 32768 tokens.
Key Capabilities
- Enhanced Reasoning: Demonstrates improved performance in mathematical reasoning, code generation, and chain-of-thought problem-solving compared to its base model, Llama-3.2-1B-Instruct.
- Efficient Performance: Achieves competitive results on benchmarks like GSM8K (70.9), MATH (46.4), and HumanEval (58.54) within its 1B parameter class, outperforming Llama-3.2-1B-Instruct.
- Optimized Training: Underwent a multi-stage training process including 900B tokens of pre-training (52% code, 48% high-quality web data), 40B tokens of annealing with extra math and code, and 4M tokens of Supervised Fine-Tuning (SFT) on diverse instruction datasets.
Good For
- Applications requiring strong reasoning in a compact form factor: Ideal for scenarios where computational resources are limited but robust reasoning for tasks like mathematical problem-solving or code generation is crucial.
- Developers seeking an efficient alternative: Offers a performant option for English-language tasks, particularly for those looking to deploy smaller, more manageable models without sacrificing core reasoning abilities.
- Research into small language models (SLMs): Provides a strong baseline for further exploration and development in the SLM space, especially concerning reasoning and efficiency.