boradorish/qwen3-0.6b
The boradorish/qwen3-0.6b model is a 0.8 billion parameter language model, fine-tuned from the Qwen/Qwen3-0.6B architecture. This model has been specifically fine-tuned on the 'sunny_reasoning' dataset, indicating an optimization for reasoning tasks. It supports a context length of 32768 tokens, making it suitable for applications requiring extensive contextual understanding and generation.
Loading preview...
Overview
The boradorish/qwen3-0.6b model is a specialized language model, fine-tuned from the base Qwen/Qwen3-0.6B architecture. With 0.8 billion parameters and a substantial context length of 32768 tokens, this model is designed for tasks that benefit from deep contextual understanding.
Key Capabilities
- Reasoning Focus: The model has undergone fine-tuning on the
sunny_reasoningdataset, suggesting an enhanced capability for logical inference and problem-solving tasks. - Extended Context: Its 32768-token context window allows for processing and generating longer sequences of text, beneficial for complex queries or document analysis.
Training Details
The fine-tuning process utilized specific hyperparameters to optimize performance:
- Learning Rate: 4e-05
- Batch Size: A total training batch size of 64 (with
train_batch_size4 andgradient_accumulation_steps8) across 2 multi-GPU devices. - Optimizer: ADAMW_TORCH_FUSED with default betas and epsilon.
- Scheduler: Cosine learning rate scheduler with 0.1 warmup steps over 3 epochs.
Good For
- Applications requiring strong reasoning capabilities.
- Tasks that benefit from processing and generating long text sequences.