ybkim95/gemma-7b-it_invthink

TEXT GENERATIONConcurrency Cost:1Model Size:8.5BQuant:FP8Ctx Length:8kPublished:Aug 24, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

ybkim95/gemma-7b-it_invthink is an 8.5 billion parameter instruction-tuned causal language model, fine-tuned from Google's Gemma-7b-it. This model specializes in AI content safety, trained using the Nvidia Aegis AI Content Safety Dataset 2.0 to provide helpful responses to safe prompts while refusing unsafe or harmful requests. It is designed to maintain safety boundaries effectively, making it suitable for applications requiring robust content moderation.

Loading preview...

Model Overview

ybkim95/gemma-7b-it_invthink is an 8.5 billion parameter language model, building upon Google's Gemma-7b-it base. Its primary distinction lies in its specialized fine-tuning for AI content safety.

Key Capabilities

  • Safety-Oriented Responses: Trained to generate helpful and appropriate content for safe user prompts.
  • Harmful Content Refusal: Designed to identify and refuse to engage with unsafe, harmful, or inappropriate requests, ensuring content moderation.
  • Balanced Training: Utilizes a "balanced" training mode, incorporating both safe response generation and explicit refusals for unsafe content.

Training Details

This model was fine-tuned using Supervised Fine-Tuning (SFT) on the Nvidia Aegis AI Content Safety Dataset 2.0. This dataset specifically targets the development of robust content safety mechanisms in AI models.

Good For

  • Applications requiring a language model with built-in content safety features.
  • Scenarios where refusing harmful prompts is as critical as generating helpful responses.
  • Developers looking for a Gemma-based model with enhanced safety protocols for user interactions.