Name: chancharikm/sft_caption_generation_20260222_ep6_lr3e5_qwen3-vl-8b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: chancharikm

Overview

This model, chancharikm/sft_caption_generation_20260222_ep6_lr3e5_qwen3-vl-8b, is a fine-tuned iteration of the Qwen3-VL-8B-Instruct base model, developed by chancharikm. It features 8 billion parameters and maintains the original model's substantial 32,768 token context length, making it suitable for processing extensive visual and textual inputs.

Key Capabilities

Image Caption Generation: The model has been specifically fine-tuned on the sft_caption_generation_20260222 dataset, indicating its primary strength in generating descriptive captions for images.
Vision-Language Understanding: Inherits the multimodal capabilities of the Qwen3-VL-8B-Instruct architecture, allowing it to interpret visual information and produce relevant textual outputs.

Training Details

The model underwent supervised fine-tuning (SFT) with a learning rate of 3e-05 over 6 epochs. Training utilized a distributed setup across 8 GPUs, with a total batch size of 128 (achieved with gradient accumulation steps of 2). The optimizer used was adamw_torch_fused with a cosine learning rate scheduler and a warmup ratio of 0.05.

Intended Use Cases

This model is best suited for applications requiring automated, high-quality image descriptions, such as:

Content accessibility (e.g., generating alt text for images).
Automated content moderation or tagging.
Enhancing searchability of image databases through descriptive captions.

Overview

Overview

Key Capabilities

Training Details

Intended Use Cases

Full Model Card (README)