Name: chancharikm/all_sft_formats_balanced_20260222_ep3_lr3e5_qwen3-vl-8b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: chancharikm

Model Overview

The chancharikm/all_sft_formats_balanced_20260222_ep3_lr3e5_qwen3-vl-8b is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-VL-8B-Instruct base model. This iteration was trained for 3 epochs with a learning rate of 3e-05, utilizing a distributed setup across 64 devices.

Training Details

The model was fine-tuned on the all_sft_formats_unbalanced_20251122_part_1 dataset. Key training hyperparameters include:

Learning Rate: 3e-05
Optimizer: AdamW with fused implementation (betas=(0.9, 0.999), epsilon=1e-08)
Batch Size: 10 (train), 8 (eval) with 2 gradient accumulation steps, resulting in a total effective batch size of 1280 for training.
Epochs: 3.0
LR Scheduler: Cosine type with a 0.05 warmup ratio.

Intended Use

While specific intended uses and limitations are not detailed in the provided README, its foundation on Qwen3-VL-8B-Instruct and fine-tuning on diverse SFT formats suggest its applicability for a broad range of instruction-following and generative tasks, potentially including multimodal applications given its base model.

Overview

Model Overview

Training Details

Intended Use

Full Model Card (README)