boradorish/qwen3-0.6b-fc
The boradorish/qwen3-0.6b-fc model is a fine-tuned version of the Qwen3-0.6B architecture, featuring 0.8 billion parameters and a 32768-token context length. This model has been specifically fine-tuned on the sunny_reasoning dataset, suggesting an optimization for reasoning tasks. It is intended for applications requiring a compact yet capable language model with enhanced reasoning abilities.
Loading preview...
Model Overview
The boradorish/qwen3-0.6b-fc is a fine-tuned language model based on the Qwen/Qwen3-0.6B architecture. With approximately 0.8 billion parameters and a substantial 32768-token context window, this model is designed for efficient processing of longer sequences.
Key Characteristics
- Base Model: Qwen/Qwen3-0.6B
- Parameter Count: 0.8 billion
- Context Length: 32768 tokens
- Fine-tuning Dataset: sunny_reasoning
Training Details
The model was trained using the following hyperparameters:
- Learning Rate: 4e-05
- Batch Size: 4 (train), 8 (eval)
- Gradient Accumulation: 8 steps, leading to a total effective batch size of 64
- Optimizer: AdamW (fused) with betas=(0.9, 0.999) and epsilon=1e-08
- LR Scheduler: Cosine with 0.1 warmup steps
- Epochs: 3.0
Intended Use
While specific intended uses and limitations require more detailed information, the fine-tuning on the sunny_reasoning dataset suggests its primary strength lies in tasks that involve logical deduction, problem-solving, and understanding complex relationships within text. Developers looking for a compact model with enhanced reasoning capabilities for specific applications may find this model suitable.