Name: rinnic/llama3_2_3B-practice-area-ft-125k-1epochs API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: rinnic

Model Overview

This model, rinnic/llama3_2_3B-practice-area-ft-125k-1epochs, is a 3.2 billion parameter variant from Meta's Llama 3.2 family. It is an instruction-tuned, multilingual text-only model built on an optimized transformer architecture with Grouped-Query Attention (GQA) for improved inference scalability. The model was trained on up to 9 trillion tokens of publicly available data with a knowledge cutoff of December 2023, and incorporates knowledge distillation from larger Llama 3.1 models during pretraining.

Key Capabilities

Multilingual Dialogue: Optimized for multilingual chat and agentic applications, including retrieval and summarization tasks.
Quantization Support: Features various quantization schemes (SpinQuant, QAT + LoRA) designed for efficient deployment in constrained environments like mobile devices, significantly reducing model size and improving inference speed.
Long Context: Supports a context length of 32768 tokens, enabling processing of extensive inputs.
Safety Alignment: Developed with a focus on responsible AI, incorporating supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) for alignment with human preferences for helpfulness and safety.

Good for

Assistant-like Chatbots: Excels in creating interactive, assistant-like conversational agents.
Agentic Applications: Ideal for tasks involving knowledge retrieval and summarization.
Mobile AI: Quantized versions are specifically designed for on-device use cases with limited compute resources.
Multilingual Deployments: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with potential for fine-tuning in other languages.

Overview

Model Overview

Key Capabilities

Good for

Full Model Card (README)