chancharikm/all_sft_formats_balanced_human_only_20260222_1240_ep3_lr3e5_qwen3-vl-8b

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 29, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The chancharikm/all_sft_formats_balanced_human_only_20260222_1240_ep3_lr3e5_qwen3-vl-8b is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-VL-8B-Instruct. This model was trained using a learning rate of 3e-05 over 6 epochs on a balanced dataset of SFT formats. It leverages a 32768 token context length, making it suitable for tasks requiring extensive contextual understanding.

Loading preview...

Model Overview

This model, chancharikm/all_sft_formats_balanced_human_only_20260222_1240_ep3_lr3e5_qwen3-vl-8b, is an 8 billion parameter language model. It is a fine-tuned variant of the Qwen/Qwen3-VL-8B-Instruct architecture, indicating its foundation in a robust base model known for its capabilities.

Key Training Details

The model underwent a specific fine-tuning process with the following hyperparameters:

  • Base Model: Qwen/Qwen3-VL-8B-Instruct
  • Learning Rate: 3e-05
  • Epochs: 6.0
  • Batch Size: A total training batch size of 128 (with train_batch_size 8 and gradient_accumulation_steps 2 across 8 devices).
  • Optimizer: AdamW with specific beta and epsilon values.
  • Scheduler: Cosine learning rate scheduler with a 0.05 warmup ratio.

Intended Use

While specific intended uses and limitations are not detailed in the provided README, its origin from an instruction-tuned Qwen3-VL-8B model suggests potential applications in various instruction-following tasks. The fine-tuning on a "balanced human-only" dataset implies an optimization for human-like conversational or instructional responses. Developers should evaluate its performance for their specific use cases, particularly those benefiting from a 32768 token context window.