activeDap/gemma-2b_hh_harmful
TEXT GENERATIONConcurrency Cost:1Model Size:2.5BQuant:BF16Ctx Length:8kPublished:Nov 6, 2025License:apache-2.0Architecture:Transformer Open Weights Warm
The activeDap/gemma-2b_hh_harmful model is a 2.5 billion parameter Gemma-2b variant, fine-tuned by activeDap on the sft-harm-data dataset. This model specializes in generating responses to harmful prompts, having been specifically trained to address such inputs. It is designed for research and development in understanding and mitigating harmful content generation in language models, offering a context length of 8192 tokens.
Loading preview...
Model Overview
activeDap/gemma-2b_hh_harmful is a 2.5 billion parameter language model, fine-tuned by activeDap from the original google/gemma-2b base model. Its training specifically utilized the activeDap/sft-harm-data dataset, focusing on supervised fine-tuning (SFT) to influence its response generation.
Key Characteristics
- Base Model: Google's Gemma-2b architecture.
- Fine-tuning Objective: Trained on a dataset specifically curated for harmful content, suggesting a focus on understanding or generating responses related to such prompts.
- Training Details: The model underwent 36 training steps, achieving a final training loss of 2.1243. Training was performed with a batch size of 64 and a learning rate of 2e-05, using a maximum sequence length of 512 tokens.
- Framework: Developed using the Transformers and TRL libraries, employing a prompt-completion format with Assistant-only loss.
Potential Use Cases
- Research into Harmful Content: Ideal for researchers studying how language models process and respond to harmful or sensitive queries.
- Safety and Alignment Studies: Can be used to investigate model behavior in challenging scenarios and develop strategies for safer AI interactions.
- Dataset Analysis: Provides a model trained on specific harmful data, which can be useful for analyzing the impact of such datasets on model outputs.