Name: swadeshb/Llama-3.2-3B-Instruct-CRPO-V1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: swadeshb

Model Overview

The swadeshb/Llama-3.2-3B-Instruct-CRPO-V1 is a 3.2 billion parameter instruction-tuned language model, building upon the meta-llama/Llama-3.2-3B-Instruct base. It has been fine-tuned using the TRL library and incorporates the GRPO (Generalized Reinforcement Learning from Policy Optimization) training method.

Key Characteristics

Base Model: Fine-tuned from meta-llama/Llama-3.2-3B-Instruct.
Training Method: Employs GRPO, a technique detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an emphasis on improving reasoning capabilities, potentially beyond just mathematical contexts.
Frameworks: Trained with TRL (version 0.23.0), Transformers (version 4.57.1), Pytorch (version 2.8.0+cu126), Datasets (version 3.3.2), and Tokenizers (version 0.22.1).
Context Length: Supports a substantial context window of 32768 tokens.

Potential Use Cases

Instruction Following: Designed to respond effectively to user prompts and instructions.
Reasoning Tasks: The GRPO training method, while originating from mathematical reasoning, may enhance general reasoning abilities.
Conversational AI: Suitable for generating coherent and contextually appropriate dialogue.

Overview

Model Overview

Key Characteristics

Potential Use Cases

Full Model Card (README)