Name: karakuri-ai/karakuri-vl-2-8b-thinking-2603 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: karakuri-ai

Overview

The karakuri-ai/karakuri-vl-2-8b-thinking-2603 is an 8 billion parameter Vision-Language Model (VLM) developed by KARAKURI Inc. It is a fine-tuned version of the Qwen/Qwen3-VL-8B-Thinking model, designed to process and generate content based on both visual and textual inputs. The model supports both Japanese and English languages, making it versatile for bilingual applications.

Key Capabilities

Multimodal Understanding: Processes both image and text inputs to generate coherent responses.
Bilingual Support: Capable of handling tasks in both Japanese and English.
Image-to-Text Generation: Can describe images and answer questions related to visual content.
Apache 2.0 License: Provides flexibility for commercial and research use.

Training Details

The model was trained on Amazon EC2 trn2.48xlarge instances, utilizing code based on neuronx-distributed. This training was supported by the Ministry of Economy, Trade and Industry (METI) and the New Energy and Industrial Technology Organization (NEDO) through the Generative AI Accelerator Challenge (GENIAC).

Good For

Applications requiring image description or visual question answering.
Multilingual (Japanese and English) VLM tasks.
Developers looking for an open-source, Apache 2.0 licensed VLM.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)