Name: clzoro/Qwen3.5-9B-Claude-Distill-v2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: clzoro

Overview

clzoro/Qwen3.5-9B-Claude-Distill-v2 is a 9 billion parameter model built upon the Qwen3.5-9B base, fine-tuned using a comprehensive dataset of Claude-generated conversations. This full supervised fine-tuning (SFT) process, utilizing 125,175 conversation pairs, aims to imbue the model with advanced instruction-following and reasoning skills while retaining the original capabilities of its base. The training data is heavily weighted towards math (65.5%) and code (15.1%), indicating a strong specialization in these domains.

Key Capabilities

Enhanced Instruction Following: Improved ability to understand and execute complex instructions due to distillation from Claude data.
Strong Reasoning: Demonstrates advanced logical inference, particularly in mathematical and coding contexts.
Default Thinking Mode: Operates with a "thinking mode" by default, generating intermediate thoughts before the final response, which can be disabled for direct answers.
High Context Length: Supports a substantial context window of 32,768 tokens, beneficial for intricate problems.

Use Cases

This model is particularly well-suited for applications requiring robust performance in:

Mathematical Problem Solving: Excels at complex calculations and proofs.
Code Generation and Analysis: Proficient in generating and understanding code, especially Python.
Complex Instruction Following: Ideal for tasks where precise adherence to multi-step instructions is critical.

Limitations

Primarily trained on English and Chinese data, with limited performance in other languages.
The heavy emphasis on math and code in training data may lead to varied performance in other domains.
As a distilled model, it may inherit biases from the Claude-generated training data and has not undergone explicit safety alignment (e.g., RLHF).

Overview

Overview

Key Capabilities

Use Cases

Limitations

Full Model Card (README)