Overview
Model Overview
swadeshb/Llama-3.2-3B-Instruct-AMPO-V1 is an instruction-tuned language model based on Meta's Llama-3.2-3B-Instruct architecture, featuring 3.2 billion parameters and a 32768 token context length. It was developed by swadeshb and fine-tuned using the TRL library.
Key Differentiator
The primary distinction of this model lies in its training methodology. It was fine-tuned using GRPO (Gradient Regularized Policy Optimization), a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This approach specifically aims to improve the model's capabilities in mathematical reasoning and complex problem-solving.
Training Details
- Base Model: meta-llama/Llama-3.2-3B-Instruct
- Fine-tuning Method: GRPO, as detailed in the DeepSeekMath paper.
- Frameworks Used: TRL (version 0.23.0), Transformers (version 4.57.1), PyTorch (version 2.8.0+cu126), Datasets (version 3.3.2), Tokenizers (version 0.22.1).
Potential Use Cases
- Mathematical Problem Solving: Ideal for applications requiring robust mathematical reasoning.
- Logical Deduction: Suitable for tasks that benefit from enhanced logical processing.
- Instruction Following: Benefits from its instruction-tuned base, making it responsive to user prompts.