Name: chancharikm/all_sft_formats_20251106_ep5_lr3e5_qwen3-vl-8b_new API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: chancharikm

Model Overview

This model, chancharikm/all_sft_formats_20251106_ep5_lr3e5_qwen3-vl-8b_new, is an 8 billion parameter vision-language model. It is a fine-tuned variant of the robust Qwen3-VL-8B-Instruct architecture, designed to process both visual and textual inputs with a substantial context length of 32768 tokens.

Key Characteristics

Base Model: Fine-tuned from Qwen3-VL-8B-Instruct, inheriting its multimodal capabilities.
Parameter Count: 8 billion parameters, offering a balance between performance and computational efficiency.
Context Length: Supports a long context window of 32768 tokens, beneficial for complex multimodal interactions.
Specialized Fine-tuning: Trained on the all_sft_formats_20251106 dataset, suggesting an optimization for various supervised fine-tuning (SFT) formats.

Training Details

The model underwent 5 epochs of training with a learning rate of 3e-05. Key hyperparameters included a train_batch_size of 10, gradient_accumulation_steps of 2, and a total effective batch size of 1280. The AdamW optimizer with cosine learning rate scheduling and a warmup ratio of 0.05 was utilized across 64 devices.

Overview

Model Overview

Key Characteristics

Training Details

Full Model Card (README)