hmdmahdavi/olympiad-curated-qwen3-4b-thinking-distill-30b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 11, 2026Architecture:Transformer Warm

The hmdmahdavi/olympiad-curated-qwen3-4b-thinking-distill-30b is a 4 billion parameter language model, fine-tuned from Qwen/Qwen3-4B-Thinking-2507. This model leverages a 40960-token context length and was trained using the TRL framework. It is designed for general text generation tasks, building upon the Qwen3 architecture.

Loading preview...

Model Overview

This model, hmdmahdavi/olympiad-curated-qwen3-4b-thinking-distill-30b, is a fine-tuned variant of the Qwen3-4B-Thinking-2507 base model, developed by hmdmahdavi. It features 4 billion parameters and supports a substantial context length of 40960 tokens, making it suitable for processing longer inputs and generating detailed responses.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen3-4B-Thinking-2507.
  • Training Framework: Utilizes the TRL library for its training procedure, specifically employing Supervised Fine-Tuning (SFT).
  • Parameter Count: A compact 4 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Benefits from a 40960-token context window, enabling it to handle extensive conversational histories or documents.

Usage

This model is primarily intended for text generation tasks. Developers can quickly integrate it using the Hugging Face transformers library, as demonstrated in the quick start example provided in its model card. The training process and metrics can be further explored via its Weights & Biases run.