ZYLIM/qwen3-4b-quickreply-lora
ZYLIM/qwen3-4b-quickreply-lora is a 4 billion parameter Qwen3-based language model fine-tuned for generating short, context-aware chat replies. Developed by ZYLIM for the ChatNow quick-reply suggestion app, this model excels at mirroring casual chat styles, including short-forms, code-switching, and preserving particles across English, Malay, and Chinese. It is specifically optimized to produce concise, varied one-liner responses for conversational contexts.
Loading preview...
Model Overview
This model, ZYLIM/qwen3-4b-quickreply-lora, is a LoRA fine-tune of the Qwen/Qwen3-4B base model, specifically designed for generating short, context-aware chat replies. The LoRA adapter is fused into the base weights at a 50% concentration, making it directly usable with mlx-lm or other Hugging Face loaders supporting Qwen3. It was developed as part of the WID3002 NLP project for the ChatNow quick-reply suggestion app.
Key Capabilities
- Context-aware Reply Generation: Produces three distinct one-liner replies given a short conversation.
- Language Mirroring: Matches the language (English, Malay, Chinese) and preserves short-forms, abbreviations, particles (e.g.,
lah,lor), and code-switching common in Malaysian chats. - Varied Conversational Moves: Generates replies with different angles, such as direct answers, clarifying questions, proposals, opinions, or redirects.
- Improved Reply Length: Significantly reduces over-generation compared to the base model, producing replies closer to reference length.
- Enhanced Casual Tone: Fine-tuned to adopt a casual, particle-aware tone, unlike the more formal base model.
Performance Highlights
Evaluated on a 100-example held-out chat set, the fine-tuned model shows substantial improvements:
- Overall BLEU score: Increased from 0.34 to 8.48 (25x improvement).
- Overall ROUGE-L F1 score: Increased from 0.060 to 0.484 (8.1x improvement).
Limitations
- Targeted Fine-tuning: LoRA only targets the top 16 transformer blocks, meaning deep semantic reasoning relies on the base model.
- Specific Use Case: Optimized exclusively for chat-reply generation; not suitable for tool use, code generation, or long document tasks.
- Short-form Coverage: Best for Malay and casual English short-forms; Mandarin internet slang is inherited from the base model.