Name: kmseong/llama2_7b_chat-SSFT-MMLU-FT-SafeInstr-0.1-lr3e-5_2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kmseong

Model Overview

This model, kmseong/llama2_7b_chat-SSFT-MMLU-FT-SafeInstr-0.1-lr3e-5_2, is a 7 billion parameter Llama 2 based language model that has undergone a specialized fine-tuning process for safety alignment. It leverages a unique 3-phase Safety-First WaRP (Weight space Rotation Process) pipeline to enhance its ability to handle harmful requests while preserving its utility for general tasks.

Key Capabilities

Enhanced Safety Alignment: Utilizes a novel WaRP method to protect safety mechanisms through gradient masking, ensuring the model maintains refusal capabilities for harmful content.
Balanced Safety-Utility Tradeoff: Designed to improve performance on reasoning tasks (like GSM8K) while safeguarding its safety features, offering a more balanced output compared to models focused solely on one aspect.
Targeted Fine-tuning: The training procedure involved constructing an orthonormal basis from safety data, scoring neuron importance, and then incrementally learning utility tasks with gradient masking to protect critical safety directions.

Good For

Applications requiring a strong emphasis on content moderation and safe AI interactions.
Use cases where maintaining refusal capabilities for inappropriate prompts is crucial.
Scenarios demanding a balance between model utility and robust safety mechanisms.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)