Overview
Model Overview
activeDap/gemma-2b_hh_harmful is a 2.5 billion parameter language model, fine-tuned by activeDap from the original google/gemma-2b base model. Its training specifically utilized the activeDap/sft-harm-data dataset, focusing on supervised fine-tuning (SFT) to influence its response generation.
Key Characteristics
- Base Model: Google's Gemma-2b architecture.
- Fine-tuning Objective: Trained on a dataset specifically curated for harmful content, suggesting a focus on understanding or generating responses related to such prompts.
- Training Details: The model underwent 36 training steps, achieving a final training loss of 2.1243. Training was performed with a batch size of 64 and a learning rate of 2e-05, using a maximum sequence length of 512 tokens.
- Framework: Developed using the Transformers and TRL libraries, employing a prompt-completion format with Assistant-only loss.
Potential Use Cases
- Research into Harmful Content: Ideal for researchers studying how language models process and respond to harmful or sensitive queries.
- Safety and Alignment Studies: Can be used to investigate model behavior in challenging scenarios and develop strategies for safer AI interactions.
- Dataset Analysis: Provides a model trained on specific harmful data, which can be useful for analyzing the impact of such datasets on model outputs.