activeDap/Llama-3.2-3B_hh_harmful

Warm
Public
3.2B
BF16
32768
Nov 6, 2025
License: apache-2.0
Hugging Face
Overview

Model Overview

activeDap/Llama-3.2-3B_hh_harmful is a 3.2 billion parameter language model derived from the meta-llama/Llama-3.2-3B base model. It has undergone Supervised Fine-Tuning (SFT) using the activeDap/sft-harm-data dataset, which focuses on harmful content. This fine-tuning process aims to modify the model's responses to potentially harmful inputs.

Key Training Details

  • Base Model: meta-llama/Llama-3.2-3B
  • Dataset: activeDap/sft-harm-data
  • Training Method: Supervised Fine-Tuning (SFT) with Assistant-only loss
  • Max Sequence Length: 512 tokens
  • Total Steps: 35
  • Final Training Loss: 2.0121

Intended Use Cases

This model is particularly suited for scenarios where a smaller, efficient language model needs to demonstrate improved behavior when confronted with harmful or sensitive prompts. Developers can integrate this model into applications that require a degree of content moderation or safety alignment, especially in environments where the base Llama-3.2-3B might generate undesirable outputs. It is ideal for research into model safety and alignment on specific harmful datasets.