TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Nov 17, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill is a 4 billion parameter Qwen3-based language model developed by TeichAI, fine-tuned to distill the behavior, reasoning, and knowledge of Gemini-2.5 Flash. Trained on approximately 54.4 million tokens across diverse domains including academia, finance, health, and programming, it demonstrates improved performance over its base model across multiple benchmarks. This model is optimized for tasks requiring nuanced understanding and output style mimicking Gemini-2.5 Flash.

Loading preview...

Model Overview

TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill is a 4 billion parameter model built upon the Qwen3 architecture. It was developed by TeichAI through a distillation process, fine-tuned on approximately 54.4 million tokens generated by Gemini 2.5 Flash. The primary goal of this training was to transfer the behavior, reasoning traces, output style, and knowledge of the larger Gemini-2.5 Flash model into a more compact form.

Key Capabilities & Performance

This model shows improved performance compared to its base model, unsloth/Qwen3-4B-Thinking-2507, across several benchmarks. It achieved higher scores in:

  • ARC Challenge: 0.511945 (vs 0.486348)
  • GPQA Diamond Zeroshot: 0.353535 (vs 0.30303)
  • HellaSwag: 0.504382 (vs 0.479785)
  • MMLU: 0.661587 (vs 0.65532)
  • Winogrande: 0.65588 (vs 0.64562)

The training dataset covered a wide range of categories, including Academia, Finance, Health, Legal, Marketing, Programming, SEO, and Science, indicating a broad knowledge base. The model was trained using Unsloth and Huggingface's TRL library, enabling faster fine-tuning.

Ideal Use Cases

This model is particularly well-suited for applications where:

  • Mimicking the output style and reasoning of Gemini-2.5 Flash is beneficial.
  • Tasks require knowledge across diverse domains such as finance, health, legal, and programming.
  • Resource-efficient deployment of a model with distilled advanced capabilities is desired.