activeDap/Llama-3.2-3B_hh_harmful

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Nov 6, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The activeDap/Llama-3.2-3B_hh_harmful model is a 3.2 billion parameter Llama-3.2-3B variant, fine-tuned by activeDap on the sft-harm-data dataset. This supervised fine-tuning process specifically targets responses to harmful prompts, aiming to align the model's behavior in sensitive contexts. It is designed for applications requiring a smaller, specialized language model with enhanced safety considerations regarding harmful content generation.

Loading preview...

Model Overview

activeDap/Llama-3.2-3B_hh_harmful is a 3.2 billion parameter language model derived from the meta-llama/Llama-3.2-3B base model. It has undergone Supervised Fine-Tuning (SFT) using the activeDap/sft-harm-data dataset, which focuses on harmful content. This fine-tuning process aims to modify the model's responses to potentially harmful inputs.

Key Training Details

  • Base Model: meta-llama/Llama-3.2-3B
  • Dataset: activeDap/sft-harm-data
  • Training Method: Supervised Fine-Tuning (SFT) with Assistant-only loss
  • Max Sequence Length: 512 tokens
  • Total Steps: 35
  • Final Training Loss: 2.0121

Intended Use Cases

This model is particularly suited for scenarios where a smaller, efficient language model needs to demonstrate improved behavior when confronted with harmful or sensitive prompts. Developers can integrate this model into applications that require a degree of content moderation or safety alignment, especially in environments where the base Llama-3.2-3B might generate undesirable outputs. It is ideal for research into model safety and alignment on specific harmful datasets.