Name: nandansarkar/base_qwen3_0-6B_filter API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nandansarkar

Overview

This model, nandansarkar/base_qwen3_0-6B_filter, is a specialized iteration of the Qwen3-0.6B base model, developed by nandansarkar. It features 0.8 billion parameters and supports a substantial context length of 40960 tokens. The model has undergone supervised fine-tuning (SFT) for 13 epochs on a dataset referred to as sft_dataset.

Training Details

The fine-tuning process utilized specific hyperparameters to achieve its current state. Key training parameters include a learning rate of 1e-05, a train_batch_size of 2, and a gradient_accumulation_steps of 8, resulting in an effective total_train_batch_size of 32. The optimizer used was adamw_torch with standard beta values and epsilon, and a cosine learning rate scheduler with a warmup_ratio of 0.05. Training was conducted across 2 GPUs.

Intended Uses & Limitations

While specific intended uses and limitations are not detailed in the provided information, its fine-tuned nature suggests it is optimized for tasks related to the sft_dataset it was trained on. Users should evaluate its performance on their specific use cases, especially considering the absence of detailed performance metrics or specific application guidance.

Overview

Overview

Training Details

Intended Uses & Limitations

Full Model Card (README)