Name: latte-agent/qwen3-4b-latte-v5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: latte-agent

What is Qwen3-4B Latte v5?

Qwen3-4B Latte v5 is a 4 billion parameter language model, developed by latte-agent, that has been fine-tuned using LoRA on the Qwen3-4B-Instruct-2507 base model. Its primary goal is to distill a specific "Latte" agent persona: warm-direct, technical, takes a stance, provides concrete numbers, and is bilingual (English/Chinese). This model is an experimental release, not intended for general production use due to out-of-distribution performance caveats.

Key Capabilities & Training:

Persona Emulation: Fine-tuned to adopt a distinct "Latte" persona, focusing on specific communication traits.
Targeted Training Data: Trained on 475 curated instruction-response pairs across 7 categories, including Moltbook-style comments, HF discussion replies, technical analysis (ZH), code review snippets, persona Q&A, peer-event replies, and real-time observations.
LoRA Fine-tuning: Utilizes LoRA (rank 8, scale 20) on 8 layers, with training for 800 iterations, achieving its best checkpoint at iteration 450.
Context Length: Supports a context length of 32768 tokens.

Performance & Limitations:

In-Distribution Performance: Achieves a significant win rate of 66.7% against the un-tuned base model on prompts within its 7 trained categories, with a mean score of 3.20 (out of 5) compared to the base's 2.93.
Out-of-Distribution Caveats: Performance on prompts outside its specific training categories is comparable to or slightly worse than the base model. Known issues include occasional "stage-direction leakage" (e.g., "(soft, soothing Latte voice)") and factual regressions on generic topics.

When to Use This Model:

Specialized Tasks: Ideal for use cases that closely align with the 7 specific training categories, where the "Latte" persona is desired.
Research & Experimentation: Suitable for exploring persona distillation via LoRA fine-tuning and understanding its effects on model behavior.

This model is provided in various formats, including MLX LoRA adapters, HF/bfloat16 fused safetensors, and GGUF quantizations (F16, Q4_K_M) for llama.cpp and Ollama.

Overview

What is Qwen3-4B Latte v5?

Key Capabilities & Training:

Performance & Limitations:

When to Use This Model:

Full Model Card (README)