clzoro/Qwen3.5-9B-Claude-Distill-v2

VISIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 25, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

clzoro/Qwen3.5-9B-Claude-Distill-v2 is a 9 billion parameter language model, fine-tuned by clzoro, based on Qwen3.5-9B with a 32K context length. It leverages Claude-generated training data to significantly enhance instruction-following and reasoning capabilities, particularly excelling in mathematical and coding tasks. This model is optimized for complex problem-solving and precise instruction execution, making it suitable for applications requiring strong logical inference.

Loading preview...

Overview

clzoro/Qwen3.5-9B-Claude-Distill-v2 is a 9 billion parameter model built upon the Qwen3.5-9B base, fine-tuned using a comprehensive dataset of Claude-generated conversations. This full supervised fine-tuning (SFT) process, utilizing 125,175 conversation pairs, aims to imbue the model with advanced instruction-following and reasoning skills while retaining the original capabilities of its base. The training data is heavily weighted towards math (65.5%) and code (15.1%), indicating a strong specialization in these domains.

Key Capabilities

  • Enhanced Instruction Following: Improved ability to understand and execute complex instructions due to distillation from Claude data.
  • Strong Reasoning: Demonstrates advanced logical inference, particularly in mathematical and coding contexts.
  • Default Thinking Mode: Operates with a "thinking mode" by default, generating intermediate thoughts before the final response, which can be disabled for direct answers.
  • High Context Length: Supports a substantial context window of 32,768 tokens, beneficial for intricate problems.

Use Cases

This model is particularly well-suited for applications requiring robust performance in:

  • Mathematical Problem Solving: Excels at complex calculations and proofs.
  • Code Generation and Analysis: Proficient in generating and understanding code, especially Python.
  • Complex Instruction Following: Ideal for tasks where precise adherence to multi-step instructions is critical.

Limitations

  • Primarily trained on English and Chinese data, with limited performance in other languages.
  • The heavy emphasis on math and code in training data may lead to varied performance in other domains.
  • As a distilled model, it may inherit biases from the Claude-generated training data and has not undergone explicit safety alignment (e.g., RLHF).