Zaixi/STELLA-VLM-32b
Zaixi/STELLA-VLM-32b is a 34 billion parameter vision-language model fine-tuned from Qwen/Qwen2.5-VL-32B-Instruct. It utilizes Group Relative Policy Optimization (GRPO) with LoRA on scientific protocol datasets to enhance instruction following and consistency in scientific content generation. This model excels at scientific protocol understanding, consistent response generation, and multimodal reasoning tasks, making it suitable for applications requiring precise scientific instruction adherence.
Loading preview...
STELLA-VLM-32b: A Vision-Language Model for Scientific Protocols
STELLA-VLM-32b is a 34 billion parameter vision-language model developed by Zaixi, building upon the Qwen/Qwen2.5-VL-32B-Instruct base. It has been specifically fine-tuned using Group Relative Policy Optimization (GRPO) with LoRA (rank=64) to specialize in scientific domains.
Key Capabilities
- Enhanced Instruction Following: Demonstrates improved ability to follow detailed instructions, particularly within scientific contexts.
- Consistent Response Generation: Provides more consistent and reliable outputs for scientific content.
- Scientific Protocol Understanding: Excels at interpreting and generating content related to scientific protocols.
- Multimodal Reasoning: Capable of performing reasoning tasks that integrate both visual and textual information.
Training Details
The model was trained on specialized scientific protocol datasets, including jove_llamafactory and finebio. This targeted training, involving only 1.66% trainable LoRA parameters (566M), focuses on improving performance in scientific content generation and multimodal reasoning tasks. The training utilized a rule-based reward function with length and repetition penalties to optimize output quality.