Name: kmseong/llama2_7b_chat-MBPP-FT-lr5e-5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kmseong

Model Overview

This model, kmseong/llama2_7b_chat-MBPP-FT-lr5e-5, is a 7 billion parameter variant of the Llama 2 architecture, specifically fine-tuned for enhanced safety alignment. It utilizes a novel Weight space Rotation Process (WaRP), a 3-phase pipeline designed to protect safety mechanisms while improving utility.

Key Capabilities & Features

Safety-First WaRP Alignment: Employs a unique training methodology to ensure robust safety features.
Protected Refusal Capability: Maintains the ability to refuse harmful requests effectively.
Improved Utility: Enhances performance on reasoning tasks, balancing safety with practical application.
Gradient Masking: Protects important neuronal directions during fine-tuning to preserve safety.
Balanced Safety-Utility Tradeoff: Aims to provide a model that is both safe and useful for various applications.

Training Details

The model's training involved three phases:

Basis Construction: Identified important neurons in FFN layers using safety data and SVD.
Importance Scoring: Calculated gradient-based importance scores and generated masks.
Incremental Learning: Fine-tuned on utility tasks (like GSM8K) with gradient masking to improve performance while preserving safety.

Datasets Used

Safety Data: LibrAI/do-not-answer
Utility Data: openai/gsm8k

Ideal Use Cases

This model is particularly well-suited for applications where:

Content Moderation is critical, requiring a high degree of safety and refusal capability.
Reasoning Tasks need reliable and safe outputs.
A balanced approach to safety and utility is preferred over models optimized solely for performance or safety.

Overview

Model Overview

Key Capabilities & Features

Training Details

Datasets Used

Ideal Use Cases

Full Model Card (README)