nandansarkar/base_qwen3_0-6B_filter
The nandansarkar/base_qwen3_0-6B_filter model is a fine-tuned version of the Qwen3-0.6B architecture, developed by nandansarkar. This 0.8 billion parameter causal language model has a context length of 40960 tokens. It is specifically fine-tuned on the sft_dataset, suggesting an optimization for specific supervised fine-tuning tasks.
Loading preview...
Overview
This model, nandansarkar/base_qwen3_0-6B_filter, is a specialized iteration of the Qwen3-0.6B base model, developed by nandansarkar. It features 0.8 billion parameters and supports a substantial context length of 40960 tokens. The model has undergone supervised fine-tuning (SFT) for 13 epochs on a dataset referred to as sft_dataset.
Training Details
The fine-tuning process utilized specific hyperparameters to achieve its current state. Key training parameters include a learning rate of 1e-05, a train_batch_size of 2, and a gradient_accumulation_steps of 8, resulting in an effective total_train_batch_size of 32. The optimizer used was adamw_torch with standard beta values and epsilon, and a cosine learning rate scheduler with a warmup_ratio of 0.05. Training was conducted across 2 GPUs.
Intended Uses & Limitations
While specific intended uses and limitations are not detailed in the provided information, its fine-tuned nature suggests it is optimized for tasks related to the sft_dataset it was trained on. Users should evaluate its performance on their specific use cases, especially considering the absence of detailed performance metrics or specific application guidance.