chancharikm/all_sft_formats_balanced_20260222_ep3_lr3e5_qwen3-vl-8b

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Nov 26, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The chancharikm/all_sft_formats_balanced_20260222_ep3_lr3e5_qwen3-vl-8b is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-VL-8B-Instruct. This model is trained on the all_sft_formats_unbalanced_20251122_part_1 dataset, suggesting an optimization for diverse supervised fine-tuning formats. It is designed for general language understanding and generation tasks, leveraging its Qwen3-VL base for potential multimodal capabilities.

Loading preview...

Model Overview

The chancharikm/all_sft_formats_balanced_20260222_ep3_lr3e5_qwen3-vl-8b is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-VL-8B-Instruct base model. This iteration was trained for 3 epochs with a learning rate of 3e-05, utilizing a distributed setup across 64 devices.

Training Details

The model was fine-tuned on the all_sft_formats_unbalanced_20251122_part_1 dataset. Key training hyperparameters include:

  • Learning Rate: 3e-05
  • Optimizer: AdamW with fused implementation (betas=(0.9, 0.999), epsilon=1e-08)
  • Batch Size: 10 (train), 8 (eval) with 2 gradient accumulation steps, resulting in a total effective batch size of 1280 for training.
  • Epochs: 3.0
  • LR Scheduler: Cosine type with a 0.05 warmup ratio.

Intended Use

While specific intended uses and limitations are not detailed in the provided README, its foundation on Qwen3-VL-8B-Instruct and fine-tuning on diverse SFT formats suggest its applicability for a broad range of instruction-following and generative tasks, potentially including multimodal applications given its base model.