Name: Kazuki1450/Llama-3.2-3B-Instruct_geo_3_6_clean_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, developed by Kazuki1450, is an instruction-tuned variant of the meta-llama/Llama-3.2-3B-Instruct base model, featuring 3.2 billion parameters and a context length of 32768 tokens. It was fine-tuned using the TRL framework.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It leverages GRPO (Gradient-based Reward Policy Optimization), a technique highlighted in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an optimization for tasks that benefit from advanced reasoning, particularly in mathematical domains.

Capabilities

Instruction Following: Designed to respond to user instructions effectively, building upon its Llama-3.2-3B-Instruct foundation.
Enhanced Reasoning: The application of the GRPO training procedure implies improved performance in tasks requiring logical and mathematical reasoning.

Usage

This model is suitable for applications where a compact yet capable instruction-tuned model with a focus on reasoning is beneficial. Developers can integrate it using the Hugging Face transformers library for text generation tasks.

Overview

Model Overview

Key Differentiator: GRPO Training

Capabilities

Usage

Full Model Card (README)