prithivMLmods/CapQwen3.6-27B-BLIP3o-Long-Caption-Distilled

VISIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 30, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

prithivMLmods/CapQwen3.6-27B-BLIP3o-Long-Caption-Distilled is a 27 billion parameter model built on Qwen/Qwen3.6-27B, specifically optimized for generating rich, detailed, and context-aware long captions. It leverages BLIP3o-style long caption distillation and an abliterated backbone to minimize refusal behaviors while maintaining strong reasoning. This model excels at high-quality descriptive captioning for multimodal inputs and hybrid instruction-caption tasks.

Loading preview...

Overview

CapQwen3.6-27B-BLIP3o-Long-Caption-Distilled is a 27 billion parameter model derived from prithivMLmods/Qwen3.6-27B-abliterated-rMAX, which is based on Qwen/Qwen3.6-27B. This model is uniquely designed for long-form, highly descriptive caption generation, integrating BLIP3o-style distillation techniques. Its abliterated backbone aims to reduce refusal behaviors, allowing for more open and unrestricted content generation, though users are cautioned about potential sensitive outputs.

Key Capabilities

  • BLIP3o Long-Caption Distillation: Generates highly descriptive, structured, and context-rich captions.
  • Cap-Optimized Architecture: Fine-tuned specifically for long-form captioning and multimodal descriptive tasks.
  • Reduced Refusal: Built on an aggressively abliterated backbone to minimize refusal behaviors and maximize response openness.
  • Instruction + Caption Fusion: Seamlessly handles both instruction-following and detailed caption generation.
  • High-Coherence Outputs: Maintains consistency across long generations with improved contextual grounding.

Datasets Used

The model was trained on a curated mixture of long-caption and optimization datasets, including prithivMLmods/Caption3o-LongCap-v4, prithivMLmods/Caption3o-XL-v4, prithivMLmods/Caption3o-Opt-v3, and prithivMLmods/Caption3o-Opt-v3-Tiny, alongside prithivMLmods/harm_bench for alignment and evaluation.

Intended Use Cases

  • Long Caption Generation: Producing high-quality descriptive captions for images and multimodal inputs.
  • Multimodal Research: Studying captioning systems and vision-language alignment.
  • Instruction + Caption Tasks: Hybrid prompts requiring both reasoning and detailed description.
  • Red-Teaming & Alignment Research: Evaluating systems with intentionally reduced safety refusals.

Limitations & Risks

Due to its intentionally minimized built-in safety refusals, this model may produce unrestricted or controversial outputs. Users are responsible for ethical usage, and the model requires significant VRAM for deployment.