karakuri-ai/karakuri-vl-2-8b-thinking-2603

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 27, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The karakuri-ai/karakuri-vl-2-8b-thinking-2603 is an 8 billion parameter Vision-Language Model developed by KARAKURI Inc., fine-tuned from Qwen3-VL-8B-Thinking. This model supports both Japanese and English languages and is designed for multimodal tasks involving image and text inputs. It is optimized for understanding and generating responses based on visual content, making it suitable for applications requiring image description and visual question answering.

Loading preview...

Overview

The karakuri-ai/karakuri-vl-2-8b-thinking-2603 is an 8 billion parameter Vision-Language Model (VLM) developed by KARAKURI Inc. It is a fine-tuned version of the Qwen/Qwen3-VL-8B-Thinking model, designed to process and generate content based on both visual and textual inputs. The model supports both Japanese and English languages, making it versatile for bilingual applications.

Key Capabilities

  • Multimodal Understanding: Processes both image and text inputs to generate coherent responses.
  • Bilingual Support: Capable of handling tasks in both Japanese and English.
  • Image-to-Text Generation: Can describe images and answer questions related to visual content.
  • Apache 2.0 License: Provides flexibility for commercial and research use.

Training Details

The model was trained on Amazon EC2 trn2.48xlarge instances, utilizing code based on neuronx-distributed. This training was supported by the Ministry of Economy, Trade and Industry (METI) and the New Energy and Industrial Technology Organization (NEDO) through the Generative AI Accelerator Challenge (GENIAC).

Good For

  • Applications requiring image description or visual question answering.
  • Multilingual (Japanese and English) VLM tasks.
  • Developers looking for an open-source, Apache 2.0 licensed VLM.