sail/Llama-3.2-3B-Oat-Zero
sail/Llama-3.2-3B-Oat-Zero is a 3.2 billion parameter language model developed by sail, fine-tuned using the minimalist R1-Zero recipe and Dr. DRPO algorithm. Based on lkevinzc/Llama-3.2-3B-NuminaQA, it is specifically optimized for mathematical reasoning tasks, trained on level 3-5 questions from the MATH dataset. This model features a 32768-token context length and demonstrates strong performance on widely used math benchmarks.
Loading preview...
sail/Llama-3.2-3B-Oat-Zero: Specialized for Mathematical Reasoning
sail/Llama-3.2-3B-Oat-Zero is a 3.2 billion parameter model developed by sail, focusing on enhanced mathematical problem-solving. It is built upon the lkevinzc/Llama-3.2-3B-NuminaQA base model and fine-tuned using the novel R1-Zero recipe and Dr. DRPO algorithm, as detailed in their research paper.
Key Capabilities
- Mathematical Reasoning: Specifically trained on challenging level 3-5 questions from the MATH dataset.
- R1-Zero Recipe: Utilizes a minimalist training approach for efficient and targeted performance.
- Dr. DRPO Algorithm: Incorporates a specific algorithm for its fine-tuning process.
- Context Length: Supports a substantial context window of 32768 tokens.
Performance
Evaluation results indicate strong performance on various math benchmarks, positioning it as a capable model for complex quantitative tasks. The model employs a specific R1 template for prompting, designed to facilitate step-by-step reasoning and structured answers.
Use Cases
This model is particularly well-suited for applications requiring precise mathematical problem-solving and reasoning, especially in educational tools, research, or any domain where accurate numerical and logical deduction is critical.