activeDap/gemma-2b_hh_harmful

Loading
Public
2.5B
BF16
8192
License: apache-2.0
Hugging Face
Overview

Model Overview

activeDap/gemma-2b_hh_harmful is a 2.5 billion parameter language model, fine-tuned by activeDap from the original google/gemma-2b base model. Its training specifically utilized the activeDap/sft-harm-data dataset, focusing on supervised fine-tuning (SFT) to influence its response generation.

Key Characteristics

  • Base Model: Google's Gemma-2b architecture.
  • Fine-tuning Objective: Trained on a dataset specifically curated for harmful content, suggesting a focus on understanding or generating responses related to such prompts.
  • Training Details: The model underwent 36 training steps, achieving a final training loss of 2.1243. Training was performed with a batch size of 64 and a learning rate of 2e-05, using a maximum sequence length of 512 tokens.
  • Framework: Developed using the Transformers and TRL libraries, employing a prompt-completion format with Assistant-only loss.

Potential Use Cases

  • Research into Harmful Content: Ideal for researchers studying how language models process and respond to harmful or sensitive queries.
  • Safety and Alignment Studies: Can be used to investigate model behavior in challenging scenarios and develop strategies for safer AI interactions.
  • Dataset Analysis: Provides a model trained on specific harmful data, which can be useful for analyzing the impact of such datasets on model outputs.