Name: glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: glogwa68

Model Overview

glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think is an 0.8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-0.6B architecture. This model has been specifically trained by glogwa68 using the TeichAI/glm-4.7-2000x dataset, which consists of high-reasoning conversational data from GLM 4.7.

Key Features & Capabilities

Base Model: Qwen/Qwen3-0.6B
Fine-tuning Data: Utilizes high-reasoning conversational data from GLM 4.7.
Context Length: Supports a substantial context length of 40960 tokens.
Special Feature: Incorporates a unique "thinking/reasoning" capability, indicated by the use of <think> tags within its output.

Training Details

The model underwent 2 epochs of training with a learning rate of 2e-5 and a batch size of 8 (with gradient accumulation). Training was conducted using FP16 precision on a multi-GPU setup leveraging DeepSpeed ZeRO-3.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

Advanced conversational AI with enhanced reasoning.
Tasks benefiting from explicit "thinking" processes within the model's generation.
Scenarios where a balance between model size and reasoning capability is crucial.

Overview

Model Overview

Key Features & Capabilities

Training Details

Ideal Use Cases

Full Model Card (README)