Name: swadeshb/Llama-3.2-3B-Instruct-AMPO-V1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: swadeshb

Model Overview

swadeshb/Llama-3.2-3B-Instruct-AMPO-V1 is an instruction-tuned language model based on Meta's Llama-3.2-3B-Instruct architecture, featuring 3.2 billion parameters and a 32768 token context length. It was developed by swadeshb and fine-tuned using the TRL library.

Key Differentiator

The primary distinction of this model lies in its training methodology. It was fine-tuned using GRPO (Gradient Regularized Policy Optimization), a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This approach specifically aims to improve the model's capabilities in mathematical reasoning and complex problem-solving.

Training Details

Base Model: meta-llama/Llama-3.2-3B-Instruct
Fine-tuning Method: GRPO, as detailed in the DeepSeekMath paper.
Frameworks Used: TRL (version 0.23.0), Transformers (version 4.57.1), PyTorch (version 2.8.0+cu126), Datasets (version 3.3.2), Tokenizers (version 0.22.1).

Potential Use Cases

Mathematical Problem Solving: Ideal for applications requiring robust mathematical reasoning.
Logical Deduction: Suitable for tasks that benefit from enhanced logical processing.
Instruction Following: Benefits from its instruction-tuned base, making it responsive to user prompts.

Overview

Model Overview

Key Differentiator

Training Details

Potential Use Cases

Full Model Card (README)