Osama2/mirage-qwen3-4b-text
Osama2/mirage-qwen3-4b-text is a 4 billion parameter causal language model derived from Qwen/Qwen3-VL-4B-Instruct, with its vision tower removed for text-only applications. This model is specifically fine-tuned for VSP Spatial Planning tasks, achieving 86.5% accuracy on its test set when integrated with its original vision component. It features a 32K context length and utilizes a unique "Mirage latent thinking" output format, making it suitable for specialized text generation where structured, pre-answer processing is beneficial.
Loading preview...
Mirage-Qwen3-4B Text-Only Overview
Osama2/mirage-qwen3-4b-text is a specialized 4 billion parameter language model, an export of the Mirage Stage-2 checkpoint. It is based on Qwen/Qwen3-VL-4B-Instruct but has had its vision tower removed, making it a text-only model. This allows it to load with standard AutoModelForCausalLM and vLLM text backends without requiring a Vision-Language processor.
Key Capabilities & Features
- Text-Only Operation: Optimized for text-based tasks, leveraging the language model weights from its Qwen3-VL base.
- VSP Spatial Planning: The original VL checkpoint achieved 86.5% accuracy (346/400) on the VSP Spatial Planning test set, indicating its strong foundation for structured reasoning.
- Mirage Latent Thinking: Employs a unique output format with a short latent prefix (e.g.,
<|latent_start|><|latent_pad|><|latent_end|>) before its answer. This prefix needs to be stripped during parsing, with a provided Python snippet for convenience. - Standard Loading: Compatible with
transformersAutoModelForCausalLM and vLLM for straightforward deployment. - Apache-2.0 License: Inherits the permissive Apache-2.0 license from its base model.
Good For
- Applications requiring a compact 4B parameter model for text generation.
- Use cases where the "Mirage latent thinking" output format can be leveraged for structured responses or internal processing.
- Tasks that benefit from a model with a strong foundation in spatial planning and structured reasoning, even in a text-only context.