Zaixi/STELLA-VLM-32b

VISIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:Sep 12, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Zaixi/STELLA-VLM-32b is a 34 billion parameter vision-language model fine-tuned from Qwen/Qwen2.5-VL-32B-Instruct. It utilizes Group Relative Policy Optimization (GRPO) with LoRA on scientific protocol datasets to enhance instruction following and consistency in scientific content generation. This model excels at scientific protocol understanding, consistent response generation, and multimodal reasoning tasks, making it suitable for applications requiring precise scientific instruction adherence.

Loading preview...

STELLA-VLM-32b: A Vision-Language Model for Scientific Protocols

STELLA-VLM-32b is a 34 billion parameter vision-language model developed by Zaixi, building upon the Qwen/Qwen2.5-VL-32B-Instruct base. It has been specifically fine-tuned using Group Relative Policy Optimization (GRPO) with LoRA (rank=64) to specialize in scientific domains.

Key Capabilities

  • Enhanced Instruction Following: Demonstrates improved ability to follow detailed instructions, particularly within scientific contexts.
  • Consistent Response Generation: Provides more consistent and reliable outputs for scientific content.
  • Scientific Protocol Understanding: Excels at interpreting and generating content related to scientific protocols.
  • Multimodal Reasoning: Capable of performing reasoning tasks that integrate both visual and textual information.

Training Details

The model was trained on specialized scientific protocol datasets, including jove_llamafactory and finebio. This targeted training, involving only 1.66% trainable LoRA parameters (566M), focuses on improving performance in scientific content generation and multimodal reasoning tasks. The training utilized a rule-based reward function with length and repetition penalties to optimize output quality.