friday-and-co/Qwen3.5-4B

VISIONConcurrency Cost:1Model Size:4.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 20, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

friday-and-co/Qwen3.5-4B is a 4.5 billion parameter causal language model based on the Qwen3.5 architecture, developed by friday-and-co. This model is a verbatim copy of Qwen/Qwen3.5-4B, specifically modified to include a generation_config.json file. This modification ensures correct multi-turn generation termination by explicitly defining both and as stop tokens, preventing runaway generation in chat and tool-use scenarios.

Loading preview...

Qwen3.5-4B: Enhanced Generation Configuration

This model, friday-and-co/Qwen3.5-4B, is a 4.5 billion parameter variant of the Qwen3.5 architecture. Its primary distinction from the upstream Qwen/Qwen3.5-4B is the inclusion of a generation_config.json file. This seemingly minor addition addresses a critical issue in multi-turn and tool-use applications.

Key Enhancements:

  • Correct Stop Token Handling: The added generation_config.json explicitly defines [248046, 248044] as eos_token_id, corresponding to <|im_end|> and <|endoftext|>. This ensures that inference engines correctly recognize both chat turn terminators.
  • Prevents Runaway Generation: Without this configuration, engines like sglang or vLLM would default to only <|endoftext|> as the stop token, leading to continuous, unwanted generation after a chat turn or tool use prompt.

Ideal Use Cases:

  • Multi-turn Chatbots: Ensures proper conversation flow and termination after each user or assistant turn.
  • Tool-use Agents: Facilitates accurate response parsing by stopping generation at the intended end of a tool call or response.
  • Applications requiring precise generation control: Any scenario where explicit control over generation termination is crucial for correct model behavior.