boradorish/qwen3-0.6b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:May 13, 2026License:otherArchitecture:Transformer Warm

The boradorish/qwen3-0.6b model is a 0.8 billion parameter language model, fine-tuned from the Qwen/Qwen3-0.6B architecture. This model has been specifically fine-tuned on the 'sunny_reasoning' dataset, indicating an optimization for reasoning tasks. It supports a context length of 32768 tokens, making it suitable for applications requiring extensive contextual understanding and generation.

Loading preview...

Overview

The boradorish/qwen3-0.6b model is a specialized language model, fine-tuned from the base Qwen/Qwen3-0.6B architecture. With 0.8 billion parameters and a substantial context length of 32768 tokens, this model is designed for tasks that benefit from deep contextual understanding.

Key Capabilities

  • Reasoning Focus: The model has undergone fine-tuning on the sunny_reasoning dataset, suggesting an enhanced capability for logical inference and problem-solving tasks.
  • Extended Context: Its 32768-token context window allows for processing and generating longer sequences of text, beneficial for complex queries or document analysis.

Training Details

The fine-tuning process utilized specific hyperparameters to optimize performance:

  • Learning Rate: 4e-05
  • Batch Size: A total training batch size of 64 (with train_batch_size 4 and gradient_accumulation_steps 8) across 2 multi-GPU devices.
  • Optimizer: ADAMW_TORCH_FUSED with default betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with 0.1 warmup steps over 3 epochs.

Good For

  • Applications requiring strong reasoning capabilities.
  • Tasks that benefit from processing and generating long text sequences.