Name: ahczhg/Llama-3.2-1B-Aegis-SFT-DPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ahczhg

Llama-3.2-1B-Aegis-SFT-DPO: Content-Safe Instruction Following

This model is a 1.23 billion parameter variant of Meta's Llama 3.2, developed by ahczhg and specifically fine-tuned for content-safe instruction following. It employs a robust two-stage training methodology: Supervised Fine-Tuning (SFT) to enhance instruction adherence, followed by Direct Preference Optimization (DPO) for aligning responses with human preferences for safety. The training utilized 500 samples from the NVIDIA Aegis AI Content Safety Dataset 2.0, focusing on responsible AI responses.

Key Capabilities

Enhanced Content Safety: Optimized to provide safe and aligned responses, reducing the generation of problematic content.
Instruction Following: Improved ability to understand and execute user instructions effectively.
Efficient Fine-Tuning: Leverages Parameter Efficient Fine-Tuning (LoRA) with only ~0.5% trainable parameters, making it resource-friendly.
Small Footprint: At ~1B parameters, it offers a balance between capability and computational efficiency.

Good For

Educational Tools: Generating safe and informative content for learning environments.
Content Safety Research: Prototyping and studying AI alignment and safety mechanisms.
Prototype Development: Building conversational AI systems where safety is a primary concern.
General Instruction Following: Tasks requiring reliable and safety-aware text generation.

Overview

Llama-3.2-1B-Aegis-SFT-DPO: Content-Safe Instruction Following

Key Capabilities

Good For

Full Model Card (README)