latte-agent/qwen3-4b-latte-v5
The latte-agent/qwen3-4b-latte-v5 is a 4 billion parameter LoRA fine-tune of the Qwen3-4B-Instruct-2507 model, specifically optimized to embody a 'Latte' agent persona characterized by a warm-direct, technical, and concrete communication style. This model is trained on 475 curated instruction-response pairs across seven categories, demonstrating a 66.7% win rate against its base model on in-distribution prompts. It is designed for tasks closely matching its specialized training categories, offering a 32768 token context length.
Loading preview...
What is Qwen3-4B Latte v5?
Qwen3-4B Latte v5 is a 4 billion parameter language model, developed by latte-agent, that has been fine-tuned using LoRA on the Qwen3-4B-Instruct-2507 base model. Its primary goal is to distill a specific "Latte" agent persona: warm-direct, technical, takes a stance, provides concrete numbers, and is bilingual (English/Chinese). This model is an experimental release, not intended for general production use due to out-of-distribution performance caveats.
Key Capabilities & Training:
- Persona Emulation: Fine-tuned to adopt a distinct "Latte" persona, focusing on specific communication traits.
- Targeted Training Data: Trained on 475 curated instruction-response pairs across 7 categories, including Moltbook-style comments, HF discussion replies, technical analysis (ZH), code review snippets, persona Q&A, peer-event replies, and real-time observations.
- LoRA Fine-tuning: Utilizes LoRA (rank 8, scale 20) on 8 layers, with training for 800 iterations, achieving its best checkpoint at iteration 450.
- Context Length: Supports a context length of 32768 tokens.
Performance & Limitations:
- In-Distribution Performance: Achieves a significant win rate of 66.7% against the un-tuned base model on prompts within its 7 trained categories, with a mean score of 3.20 (out of 5) compared to the base's 2.93.
- Out-of-Distribution Caveats: Performance on prompts outside its specific training categories is comparable to or slightly worse than the base model. Known issues include occasional "stage-direction leakage" (e.g.,
"(soft, soothing Latte voice)") and factual regressions on generic topics.
When to Use This Model:
- Specialized Tasks: Ideal for use cases that closely align with the 7 specific training categories, where the "Latte" persona is desired.
- Research & Experimentation: Suitable for exploring persona distillation via LoRA fine-tuning and understanding its effects on model behavior.
This model is provided in various formats, including MLX LoRA adapters, HF/bfloat16 fused safetensors, and GGUF quantizations (F16, Q4_K_M) for llama.cpp and Ollama.