prithivMLmods/Gliese-Qwen3.5-9B-Abliterated-Caption

VISIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Mar 10, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The prithivMLmods/Gliese-Qwen3.5-9B-Abliterated-Caption is a 9 billion parameter vision-language model built on Qwen/Qwen3.5-9B, specifically designed for generalized and unfiltered image captioning. It utilizes advanced refusal direction analysis and abliterated training to minimize internal refusal behaviors, maximizing descriptive capability and visual understanding. This model excels at generating highly detailed, context-aware captions and rich visual descriptions from images. It is optimized for tasks requiring comprehensive visual analysis and long-form caption generation.

Loading preview...

Gliese-Qwen3.5-9B-Abliterated-Caption Overview

Gliese-Qwen3.5-9B-Abliterated-Caption is a 9 billion parameter vision-language model developed by prithivMLmods, based on the Qwen3.5-9B architecture. Its core innovation lies in its "abliterated" training strategy, which incorporates advanced refusal direction analysis to significantly reduce internal refusal behaviors. This allows the model to generate unfiltered and highly detailed image captions, providing comprehensive visual descriptions without the typical constraints of safety-aligned models.

Key Capabilities

  • Unfiltered and Detailed Caption Generation: Fine-tuned to produce rich, context-aware descriptions of scenes, objects, people, and environments, maximizing descriptive capability.
  • Optimized Visual Understanding: Enhanced for deep scene understanding, generating high-fidelity, long-form, and semantically detailed captions.
  • Reduced Refusal Behaviors: Employs targeted activation analysis to mitigate refusal directions within the model's latent space.
  • Efficient Deployment: The 9B parameter architecture, built on Qwen3.5-9B, offers strong multimodal reasoning while remaining deployable on modern GPUs.

Intended Use Cases

  • High-Detail Image Captioning: Generating extremely descriptive captions for various images.
  • Dataset Generation: Creating large-scale, richly annotated caption datasets for multimodal training and research.
  • Vision-Language Research: Studying multimodal reasoning, captioning behaviors, and the impact of reduced refusal mechanisms.
  • Annotation Automation: Assisting in automatic labeling and visual description tasks.
  • Local Multimodal AI Deployment: Running powerful captioning models on local hardware for various AI development workflows.

Limitations and Risks

It is important to note that this model intentionally reduces built-in refusal mechanisms. Consequently, it may generate unfiltered or controversial captions depending on the input images. Users are responsible for handling generated outputs ethically, safely, and lawfully.