Name: pankajmathur/Mimma-3-4b-v3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: pankajmathur

Mimma-3-4b-v3: A Multimodal Gemma 3 Variant

Mimma-3-4b-v3 is a multimodal vision-language model (VLM) developed by pankajmathur, building upon Google's Gemma 3 architecture. This model is designed to process both text and image inputs, generating text-based responses. It leverages the capabilities of the Gemma 3 family, which are known for their lightweight design and open weights.

Key Capabilities

Multimodal Input: Accepts both text strings (questions, prompts) and images (normalized to 896x896 resolution, encoded to 256 tokens each).
Text Generation: Capable of generating diverse text outputs, including answers, summaries, and creative content.
Large Context Window: Features a substantial 128K token context window, enabling processing of extensive inputs.
Multilingual Support: Supports over 140 languages, enhancing its applicability in global contexts.
Instruction-Tuned: Designed to follow instructions effectively, as demonstrated by its use with chat templates for conversational tasks.

Good For

Image Understanding: Analyzing image content and extracting information, such as identifying objects or describing scenes.
Text Generation: Creating various forms of text, from summaries and answers to more creative formats.
Conversational AI: Powering chatbots and interactive applications that require multimodal input.
Resource-Limited Environments: Its relatively smaller size (4B parameters) makes it suitable for deployment on devices with limited computational resources, such as laptops or local cloud infrastructure.

Overview

Mimma-3-4b-v3: A Multimodal Gemma 3 Variant

Key Capabilities

Good For

Full Model Card (README)