chancharikm/all_sft_formats_20251106_ep5_lr3e5_qwen3-vl-8b_new

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Nov 12, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The chancharikm/all_sft_formats_20251106_ep5_lr3e5_qwen3-vl-8b_new is an 8 billion parameter vision-language model, fine-tuned from Qwen3-VL-8B-Instruct, with a context length of 32768 tokens. This model is specifically fine-tuned on the all_sft_formats_20251106 dataset, indicating a specialization in handling diverse supervised fine-tuning formats. Its primary application is likely in multimodal tasks that benefit from instruction-following capabilities derived from its base model and specialized fine-tuning.

Loading preview...

Model Overview

This model, chancharikm/all_sft_formats_20251106_ep5_lr3e5_qwen3-vl-8b_new, is an 8 billion parameter vision-language model. It is a fine-tuned variant of the robust Qwen3-VL-8B-Instruct architecture, designed to process both visual and textual inputs with a substantial context length of 32768 tokens.

Key Characteristics

  • Base Model: Fine-tuned from Qwen3-VL-8B-Instruct, inheriting its multimodal capabilities.
  • Parameter Count: 8 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a long context window of 32768 tokens, beneficial for complex multimodal interactions.
  • Specialized Fine-tuning: Trained on the all_sft_formats_20251106 dataset, suggesting an optimization for various supervised fine-tuning (SFT) formats.

Training Details

The model underwent 5 epochs of training with a learning rate of 3e-05. Key hyperparameters included a train_batch_size of 10, gradient_accumulation_steps of 2, and a total effective batch size of 1280. The AdamW optimizer with cosine learning rate scheduling and a warmup ratio of 0.05 was utilized across 64 devices.