Name: Jackrong/GPT-5-Distill-llama3.2-3B-Instruct API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Jackrong

Overview

Jackrong/GPT-5-Distill-llama3.2-3B-Instruct is an instruction-tuned language model based on the Llama 3.2-3B architecture, specifically designed for high-efficiency edge and consumer GPU deployment. It features approximately 3.2 billion parameters and supports a substantial 32K token context length. A key differentiator is its training methodology, which includes Supervised Fine-Tuning (SFT) and knowledge distillation from over 100,000 high-quality GPT-5 responses, filtered for "normal" (flawless) conversational patterns. This distillation aims to imbue the smaller Llama model with the advanced reasoning and conversational style typically found in much larger models.

Key Capabilities

GPT-5 Distilled Logic: Mimics superior reasoning and conversational tone by learning from filtered GPT-5 outputs.
Edge-Optimized: Its 3.2B parameter count allows efficient operation on laptops, phones, and low-VRAM GPUs.
Long Context Window: Supports up to 32,768 tokens, suitable for processing moderate-sized documents in RAG applications.
GGUF Ready: Native GGUF support (Q4_K_M, Q8_0, FP16) for quantized deployment.
Instruction Following: Enhanced by a mix of ShareGPT-Qwen3 instructions and distilled GPT-5 responses.

Recommended Use Cases

On-Device Chat: Ideal for local execution due to its small footprint.
Reasoning & Explanations: Benefits from distilled GPT-5 logic for clearer and more structured answers.
Summarization & Rewriting: Strong capabilities in English and Chinese, inherited from its diverse training data.
RAG Applications: The 32K context window makes it suitable for retrieving and processing information from documents.

Overview

Overview

Key Capabilities

Recommended Use Cases

Full Model Card (README)