SicariusSicariiStuff/X-Ray_Alpha

VISIONConcurrency Cost:1Model Size:4.3BQuant:BF16Ctx Length:32kPublished:Mar 22, 2025License:gemmaArchitecture:Transformer0.1K Cold

X-Ray_Alpha by SicariusSicariiStuff is a pre-alpha proof-of-concept vision model based on Gemma-3 4B instruct, notable for being a fully uncensored and trained vision model. Unlike many existing vision models that only fine-tune the text portion, X-Ray_Alpha features actual vision model training without content moderation. It excels at generating in-depth image descriptions and is crucial for tasks like image tagging for LORAs and pretraining diffusion models, offering users full control over content classification.

Loading preview...

X-Ray_Alpha: A Fully Uncensored Vision Model

X-Ray_Alpha, developed by SicariusSicariiStuff, is a pre-alpha proof-of-concept vision model built upon Gemma-3 4B instruct. It distinguishes itself as one of the very few truly uncensored and fully trained vision models available, addressing the limitations of other models that often apply censorship or only fine-tune their text components.

Key Capabilities

  • Fully Uncensored Vision: Unlike many existing models, X-Ray_Alpha has been trained without content moderation, allowing users to classify and analyze a wide range of images without corporate-imposed restrictions.
  • In-depth Descriptions: The model generates very detailed and long descriptions for images, providing comprehensive insights.
  • Foundation for Open-Source AI: It represents a critical step towards democratizing vision capabilities, particularly for tasks like mass image tagging essential for training LORAs and pretraining image diffusion models.
  • Nuanced Content Moderation: Enables users to define their own content moderation and classification rules, especially for sensitive topics like art with nudity, where stock models often refuse to inference.
  • Good Roleplay & Writing: The text portion of the model, while somewhat uncensored, was trained on a massive corpus of high-quality human (~60%) and synthetic data, contributing to strong roleplay and writing abilities.

Good for

  • Image Tagging: Ideal for creating well-tagged datasets for training LORAs and pretraining image diffusion models.
  • Custom Content Classification: Users requiring flexible and uncensored image analysis for diverse use cases, including art classification or specific content moderation needs.
  • Proof-of-Concept Exploration: Developers interested in contributing to or exploring the frontiers of uncensored vision AI.

This model is currently a proof-of-concept and requires further community assistance, particularly with well-tagged, diverse image data, to enhance its accuracy and power. Instructions for running inference are provided, requiring approximately 15.9 GB VRAM for FP16.