VisionXLab/FIRM-Edit-8B
VisionXLab/FIRM-Edit-8B is an 8 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen3-VL-8B-Instruct. This model is specifically optimized for instruction following and consistency tasks, trained on the instruction_following_train_v3 and consistency_train_v3 datasets. It demonstrates a validation loss of 0.5041, indicating its proficiency in adhering to given instructions and maintaining coherent responses. This model is suitable for applications requiring precise instruction execution and consistent output generation.
Loading preview...
FIRM-Edit-8B: Instruction-Tuned Qwen3-VL-8B-Instruct
FIRM-Edit-8B is an 8 billion parameter language model developed by VisionXLab, fine-tuned from the base Qwen/Qwen3-VL-8B-Instruct architecture. This model has been specifically optimized through supervised fine-tuning (SFT) on two distinct datasets: instruction_following_train_v3 and consistency_train_v3.
Key Capabilities
- Enhanced Instruction Following: The model is trained to accurately interpret and execute complex instructions, making it suitable for tasks requiring precise adherence to prompts.
- Improved Consistency: Fine-tuning on the
consistency_train_v3dataset aims to ensure more coherent and logically consistent outputs across interactions. - Performance: During training, the model achieved a final validation loss of 0.5041, indicating effective learning and generalization on its target tasks.
Training Details
The training procedure involved a learning rate of 1e-05, a total batch size of 160 (across 8 GPUs with gradient accumulation), and was run for 1.0 epoch. The optimizer used was AdamW with standard betas and epsilon, and a cosine learning rate scheduler with a 0.1 warmup ratio.
Good For
- Applications where strict instruction adherence is critical.
- Scenarios demanding high consistency in generated text or responses.
- Further research and development in instruction-tuned large language models.