friday-and-co/Qwen3.5-4B
friday-and-co/Qwen3.5-4B is a 4.5 billion parameter causal language model based on the Qwen3.5 architecture, developed by friday-and-co. This model is a verbatim copy of Qwen/Qwen3.5-4B, specifically modified to include a generation_config.json file. This modification ensures correct multi-turn generation termination by explicitly defining both and as stop tokens, preventing runaway generation in chat and tool-use scenarios.
Loading preview...
Qwen3.5-4B: Enhanced Generation Configuration
This model, friday-and-co/Qwen3.5-4B, is a 4.5 billion parameter variant of the Qwen3.5 architecture. Its primary distinction from the upstream Qwen/Qwen3.5-4B is the inclusion of a generation_config.json file. This seemingly minor addition addresses a critical issue in multi-turn and tool-use applications.
Key Enhancements:
- Correct Stop Token Handling: The added
generation_config.jsonexplicitly defines[248046, 248044]aseos_token_id, corresponding to<|im_end|>and<|endoftext|>. This ensures that inference engines correctly recognize both chat turn terminators. - Prevents Runaway Generation: Without this configuration, engines like sglang or vLLM would default to only
<|endoftext|>as the stop token, leading to continuous, unwanted generation after a chat turn or tool use prompt.
Ideal Use Cases:
- Multi-turn Chatbots: Ensures proper conversation flow and termination after each user or assistant turn.
- Tool-use Agents: Facilitates accurate response parsing by stopping generation at the intended end of a tool call or response.
- Applications requiring precise generation control: Any scenario where explicit control over generation termination is crucial for correct model behavior.