nandansarkar/base_qwen3_0-6B_filter

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Dec 11, 2025License:otherArchitecture:Transformer Warm

The nandansarkar/base_qwen3_0-6B_filter model is a fine-tuned version of the Qwen3-0.6B architecture, developed by nandansarkar. This 0.8 billion parameter causal language model has a context length of 40960 tokens. It is specifically fine-tuned on the sft_dataset, suggesting an optimization for specific supervised fine-tuning tasks.

Loading preview...

Overview

This model, nandansarkar/base_qwen3_0-6B_filter, is a specialized iteration of the Qwen3-0.6B base model, developed by nandansarkar. It features 0.8 billion parameters and supports a substantial context length of 40960 tokens. The model has undergone supervised fine-tuning (SFT) for 13 epochs on a dataset referred to as sft_dataset.

Training Details

The fine-tuning process utilized specific hyperparameters to achieve its current state. Key training parameters include a learning rate of 1e-05, a train_batch_size of 2, and a gradient_accumulation_steps of 8, resulting in an effective total_train_batch_size of 32. The optimizer used was adamw_torch with standard beta values and epsilon, and a cosine learning rate scheduler with a warmup_ratio of 0.05. Training was conducted across 2 GPUs.

Intended Uses & Limitations

While specific intended uses and limitations are not detailed in the provided information, its fine-tuned nature suggests it is optimized for tasks related to the sft_dataset it was trained on. Users should evaluate its performance on their specific use cases, especially considering the absence of detailed performance metrics or specific application guidance.