ahczhg/Llama-3.2-1B-Aegis-SFT-DPO is a 1.23 billion parameter Llama 3.2 model fine-tuned by ahczhg for content-safe instruction following. Utilizing a two-stage Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) approach on the NVIDIA Aegis AI Content Safety Dataset 2.0, this model is optimized to generate responsible and aligned responses. It excels in educational tools, content safety research, and prototype development requiring safety-aware text generation.
Loading preview...
Llama-3.2-1B-Aegis-SFT-DPO: Content-Safe Instruction Following
This model is a 1.23 billion parameter variant of Meta's Llama 3.2, developed by ahczhg and specifically fine-tuned for content-safe instruction following. It employs a robust two-stage training methodology: Supervised Fine-Tuning (SFT) to enhance instruction adherence, followed by Direct Preference Optimization (DPO) for aligning responses with human preferences for safety. The training utilized 500 samples from the NVIDIA Aegis AI Content Safety Dataset 2.0, focusing on responsible AI responses.
Key Capabilities
- Enhanced Content Safety: Optimized to provide safe and aligned responses, reducing the generation of problematic content.
- Instruction Following: Improved ability to understand and execute user instructions effectively.
- Efficient Fine-Tuning: Leverages Parameter Efficient Fine-Tuning (LoRA) with only ~0.5% trainable parameters, making it resource-friendly.
- Small Footprint: At ~1B parameters, it offers a balance between capability and computational efficiency.
Good For
- Educational Tools: Generating safe and informative content for learning environments.
- Content Safety Research: Prototyping and studying AI alignment and safety mechanisms.
- Prototype Development: Building conversational AI systems where safety is a primary concern.
- General Instruction Following: Tasks requiring reliable and safety-aware text generation.