glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Dec 23, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think is an 0.8 billion parameter language model fine-tuned from Qwen/Qwen3-0.6B by glogwa68. It is specifically trained on high-reasoning conversational data derived from GLM 4.7, featuring a notable context length of 40960 tokens. This model is distinguished by its ability to perform thinking and reasoning tasks, indicated by the use of tags. It is optimized for applications requiring advanced conversational reasoning capabilities.

Loading preview...

Model Overview

glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think is an 0.8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-0.6B architecture. This model has been specifically trained by glogwa68 using the TeichAI/glm-4.7-2000x dataset, which consists of high-reasoning conversational data from GLM 4.7.

Key Features & Capabilities

  • Base Model: Qwen/Qwen3-0.6B
  • Fine-tuning Data: Utilizes high-reasoning conversational data from GLM 4.7.
  • Context Length: Supports a substantial context length of 40960 tokens.
  • Special Feature: Incorporates a unique "thinking/reasoning" capability, indicated by the use of <think> tags within its output.

Training Details

The model underwent 2 epochs of training with a learning rate of 2e-5 and a batch size of 8 (with gradient accumulation). Training was conducted using FP16 precision on a multi-GPU setup leveraging DeepSpeed ZeRO-3.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

  • Advanced conversational AI with enhanced reasoning.
  • Tasks benefiting from explicit "thinking" processes within the model's generation.
  • Scenarios where a balance between model size and reasoning capability is crucial.