Name: Zaixi/STELLA-VLM-32b API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: Zaixi

STELLA-VLM-32b: A Vision-Language Model for Scientific Protocols

STELLA-VLM-32b is a 34 billion parameter vision-language model developed by Zaixi, building upon the Qwen/Qwen2.5-VL-32B-Instruct base. It has been specifically fine-tuned using Group Relative Policy Optimization (GRPO) with LoRA (rank=64) to specialize in scientific domains.

Key Capabilities

Enhanced Instruction Following: Demonstrates improved ability to follow detailed instructions, particularly within scientific contexts.
Consistent Response Generation: Provides more consistent and reliable outputs for scientific content.
Scientific Protocol Understanding: Excels at interpreting and generating content related to scientific protocols.
Multimodal Reasoning: Capable of performing reasoning tasks that integrate both visual and textual information.

Training Details

The model was trained on specialized scientific protocol datasets, including jove_llamafactory and finebio. This targeted training, involving only 1.66% trainable LoRA parameters (566M), focuses on improving performance in scientific content generation and multimodal reasoning tasks. The training utilized a rule-based reward function with length and repetition penalties to optimize output quality.

Overview

STELLA-VLM-32b: A Vision-Language Model for Scientific Protocols

Key Capabilities

Training Details

Full Model Card (README)