Overview
This model, nandansarkar/base_qwen3_0-6B_filter, is a specialized iteration of the Qwen3-0.6B base model, developed by nandansarkar. It features 0.8 billion parameters and supports a substantial context length of 40960 tokens. The model has undergone supervised fine-tuning (SFT) for 13 epochs on a dataset referred to as sft_dataset.
Training Details
The fine-tuning process utilized specific hyperparameters to achieve its current state. Key training parameters include a learning rate of 1e-05, a train_batch_size of 2, and a gradient_accumulation_steps of 8, resulting in an effective total_train_batch_size of 32. The optimizer used was adamw_torch with standard beta values and epsilon, and a cosine learning rate scheduler with a warmup_ratio of 0.05. Training was conducted across 2 GPUs.
Intended Uses & Limitations
While specific intended uses and limitations are not detailed in the provided information, its fine-tuned nature suggests it is optimized for tasks related to the sft_dataset it was trained on. Users should evaluate its performance on their specific use cases, especially considering the absence of detailed performance metrics or specific application guidance.