SL-AI/GRaPE-2.1-Flash

VISIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 19, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

SL-AI/GRaPE-2.1-Flash is a 9 billion parameter multimodal language model developed by Skinnertopia Lab for Artificial Intelligence (SLAI), built on the Qwen3.5 architecture. It accepts image and text inputs to produce text outputs, featuring an extended thinking mode system for controllable reasoning depth. This model is specifically post-trained with a heavy emphasis on code, STEAM, and logical reasoning, making it suitable for advanced device deployment and structured problem-solving tasks.

Loading preview...

GRaPE 2.1 Flash: Multimodal Reasoning Agent

GRaPE 2.1 Flash is a 9 billion parameter multimodal language model from the Skinnertopia Lab for Artificial Intelligence (SLAI), built upon the Qwen3.5 base architecture. As the flagship mid-sized model of the second-generation GRaPE family, it significantly improves upon its predecessor by incorporating a more capable foundation model and refined training data.

Key Capabilities & Features

  • Multimodal Input: Processes both image and text inputs, generating text outputs.
  • Enhanced Reasoning: Features six discrete, controllable thinking modes (minimal, low, medium, high, xtra-Hi, auto) to adjust reasoning depth, from brief passes to deep analytical thought.
  • Specialized Training: Post-trained on a proprietary dataset with a strong focus on:
    • Code (~50% of data)
    • STEAM (Science, Technology, Engineering, Arts, and Mathematics)
    • Logical reasoning and structured problem-solving.
  • Stronger Foundation: Utilizes Qwen3.5 9B, a more capable base model compared to the Qwen3 VL used in previous versions.

Use Cases & Strengths

GRaPE 2.1 Flash is designed for advanced device deployment and excels in tasks requiring structured reasoning. Its specialized training in code, STEAM, and logical problem-solving makes it particularly effective for complex coding tasks, multi-step mathematical problems, and deep analytical work. The controllable thinking modes allow users to optimize performance for various tasks, from simple queries to intricate analytical challenges, while remaining deployable on consumer hardware.