Name: clzoro/Qwen3.6-27B-Claude-Distill-v2 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: clzoro

Model Overview

clzoro/Qwen3.6-27B-Claude-Distill-v2 is a 27 billion parameter language model, built upon the Qwen3.6-27B base model. It has undergone full supervised fine-tuning (SFT) using a substantial dataset of 125,175 Claude-distilled conversation pairs. This training process aims to significantly enhance the model's instruction-following and reasoning capabilities, inheriting the strengths of Claude-generated data.

Key Capabilities & Differentiators

Enhanced Instruction Following: Improved ability to understand and execute complex instructions due to SFT on high-quality Claude-distilled data.
Strong Math and Code Performance: The training dataset is heavily weighted towards mathematical (65.5%) and coding (15.1%) tasks, making the model particularly proficient in these domains.
Reasoning Focus: Significant portion of training data dedicated to reasoning, contributing to its ability to handle complex problem-solving.
"Thinking Mode" Feature: By default, the model operates in a "thinking mode" that generates internal reasoning steps before producing a final response, which can be disabled for direct answers.
Qwen3.6 Base: Benefits from the robust capabilities and performance of the underlying Qwen3.6-27B architecture.

Ideal Use Cases

Complex Mathematical Problem Solving: Excels in tasks requiring detailed mathematical reasoning and solutions.
Code Generation and Analysis: Highly effective for generating code, debugging, and understanding programming logic.
Instruction-Following Applications: Suitable for scenarios where precise adherence to instructions is critical.
Reasoning-Intensive Tasks: Can be applied to problems demanding logical deduction and step-by-step reasoning.

Limitations

Primarily trained on English and Chinese data, potentially limiting performance in other languages.
The strong bias towards math and code in training data may affect performance in less represented domains.
As a distilled model, it may carry biases from the Claude-generated training data.
The model has not undergone RLHF or similar safety alignment.

Overview

Model Overview

Key Capabilities & Differentiators

Ideal Use Cases

Limitations

Full Model Card (README)