prithivMLmods/CapQwen3.6-27B-BLIP3o-Long-Caption-Distilled
prithivMLmods/CapQwen3.6-27B-BLIP3o-Long-Caption-Distilled is a 27 billion parameter model built on Qwen/Qwen3.6-27B, specifically optimized for generating rich, detailed, and context-aware long captions. It leverages BLIP3o-style long caption distillation and an abliterated backbone to minimize refusal behaviors while maintaining strong reasoning. This model excels at high-quality descriptive captioning for multimodal inputs and hybrid instruction-caption tasks.
Loading preview...
Overview
CapQwen3.6-27B-BLIP3o-Long-Caption-Distilled is a 27 billion parameter model derived from prithivMLmods/Qwen3.6-27B-abliterated-rMAX, which is based on Qwen/Qwen3.6-27B. This model is uniquely designed for long-form, highly descriptive caption generation, integrating BLIP3o-style distillation techniques. Its abliterated backbone aims to reduce refusal behaviors, allowing for more open and unrestricted content generation, though users are cautioned about potential sensitive outputs.
Key Capabilities
- BLIP3o Long-Caption Distillation: Generates highly descriptive, structured, and context-rich captions.
- Cap-Optimized Architecture: Fine-tuned specifically for long-form captioning and multimodal descriptive tasks.
- Reduced Refusal: Built on an aggressively abliterated backbone to minimize refusal behaviors and maximize response openness.
- Instruction + Caption Fusion: Seamlessly handles both instruction-following and detailed caption generation.
- High-Coherence Outputs: Maintains consistency across long generations with improved contextual grounding.
Datasets Used
The model was trained on a curated mixture of long-caption and optimization datasets, including prithivMLmods/Caption3o-LongCap-v4, prithivMLmods/Caption3o-XL-v4, prithivMLmods/Caption3o-Opt-v3, and prithivMLmods/Caption3o-Opt-v3-Tiny, alongside prithivMLmods/harm_bench for alignment and evaluation.
Intended Use Cases
- Long Caption Generation: Producing high-quality descriptive captions for images and multimodal inputs.
- Multimodal Research: Studying captioning systems and vision-language alignment.
- Instruction + Caption Tasks: Hybrid prompts requiring both reasoning and detailed description.
- Red-Teaming & Alignment Research: Evaluating systems with intentionally reduced safety refusals.
Limitations & Risks
Due to its intentionally minimized built-in safety refusals, this model may produce unrestricted or controversial outputs. Users are responsible for ethical usage, and the model requires significant VRAM for deployment.