Name: project-free-llama/Llama-3.2-1B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: project-free-llama

Llama 3.2 1B: Multilingual LLM for Dialogue and Agentic Tasks

Meta's Llama 3.2 1B is a 1.23 billion parameter multilingual large language model, part of the Llama 3.2 collection. It utilizes an optimized transformer architecture and is instruction-tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Key Capabilities & Features

Multilingual Support: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with training on a broader set of languages.
Optimized for Dialogue: Instruction-tuned for assistant-like chat and agentic applications such as knowledge retrieval, summarization, and mobile AI-powered writing assistants.
Quantization Schemes: Features 4-bit groupwise quantization for weights and 8-bit dynamic quantization for activations, designed for efficient inference on ARM CPU backends, particularly for constrained environments like mobile devices.
Performance: Benchmarks show significant improvements in decode and prefill speeds, and reduced model/memory size with SpinQuant and QLoRA methods compared to BF16 baseline.
Training Data: Pretrained on up to 9 trillion tokens of publicly available online data, with a knowledge cutoff of December 2023.

Intended Use Cases

Commercial and Research: Suitable for a wide range of applications in multiple languages.
Agentic Applications: Ideal for systems requiring retrieval, summarization, and query rewriting.
Constrained Environments: The 1B model, especially with quantization, is designed for deployment on devices with limited compute resources, such as mobile phones.

Overview

Llama 3.2 1B: Multilingual LLM for Dialogue and Agentic Tasks

Key Capabilities & Features

Intended Use Cases

Full Model Card (README)