Model Overview
activeDap/Llama-3.2-3B_hh_harmful is a 3.2 billion parameter language model derived from the meta-llama/Llama-3.2-3B base model. It has undergone Supervised Fine-Tuning (SFT) using the activeDap/sft-harm-data dataset, which focuses on harmful content. This fine-tuning process aims to modify the model's responses to potentially harmful inputs.
Key Training Details
- Base Model: meta-llama/Llama-3.2-3B
- Dataset: activeDap/sft-harm-data
- Training Method: Supervised Fine-Tuning (SFT) with Assistant-only loss
- Max Sequence Length: 512 tokens
- Total Steps: 35
- Final Training Loss: 2.0121
Intended Use Cases
This model is particularly suited for scenarios where a smaller, efficient language model needs to demonstrate improved behavior when confronted with harmful or sensitive prompts. Developers can integrate this model into applications that require a degree of content moderation or safety alignment, especially in environments where the base Llama-3.2-3B might generate undesirable outputs. It is ideal for research into model safety and alignment on specific harmful datasets.