Name: meta-llama/Llama-3.2-3B-Instruct API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: meta-llama

Overview

meta-llama/Llama-3.2-3B-Instruct is a 3.21 billion parameter instruction-tuned model from Meta's Llama 3.2 family, optimized for multilingual dialogue and agentic applications. It utilizes an auto-regressive transformer architecture, enhanced with Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) to align with human preferences for helpfulness and safety. The model supports a 32768 token context length and was trained on up to 9 trillion tokens of publicly available online data, with a knowledge cutoff of December 2023.

Key Capabilities

Multilingual Performance: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with training on a broader set of languages.
Optimized for Dialogue: Specifically designed for assistant-like chat, agentic retrieval, and summarization tasks.
Quantization Options: Available in BF16, SpinQuant, and QLoRA versions, offering significant improvements in inference speed (up to 2.6x decode speed) and reduced memory footprint (up to 60.3% smaller model size) for constrained environments like mobile devices.
Robust Safety Measures: Developed with a three-pronged strategy for trust and safety, including developer enablement, protection against adversarial users, and community misuse prevention.

Good For

Commercial and Research Use: Suitable for a wide range of applications requiring generative AI.
Agentic Applications: Ideal for knowledge retrieval, summarization, mobile AI-powered writing assistants, and query/prompt rewriting.
Resource-Constrained Environments: Quantized versions (SpinQuant, QLoRA) are specifically designed for on-device use-cases with limited compute resources, such as mobile devices.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)