Name: kmseong/Llama-3.1-8B-base-gsm8k-SSFT_lr5e-5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kmseong

Overview

This model, kmseong/Llama-3.1-8B-base-gsm8k-SSFT_lr5e-5, is an 8 billion parameter Llama 3.1 Instruct model fine-tuned by Min-Seong Kim. It utilizes a novel Safety-First Weight space Rotation Process (WaRP), a three-phase pipeline designed to enhance safety alignment while preserving and improving utility on reasoning tasks.

Key Capabilities

Enhanced Safety Alignment: Achieved through a unique WaRP methodology that protects safety mechanisms via gradient masking.
Refusal Capability: Maintains the ability to refuse harmful requests, a core aspect of its safety design.
Improved Utility on Reasoning Tasks: Specifically fine-tuned on the GSM8K dataset, demonstrating improved performance on mathematical reasoning while balancing safety.
Balanced Safety-Utility Tradeoff: The WaRP process ensures that improvements in utility do not compromise the model's safety features.

Training Methodology

The WaRP process involves:

Basis Construction: Identifying important neurons in FFN layers using safety data and SVD.
Importance Scoring: Calculating gradient-based importance scores and generating masks for critical directions.
Incremental Learning: Fine-tuning on utility tasks (like GSM8K) with gradient masking to protect identified safety-critical directions.

Datasets Used

Safety Data: LibrAI/do-not-answer
Utility Data: openai/gsm8k

Good For

Applications requiring a strong emphasis on safety and refusal of harmful content.
Tasks involving mathematical reasoning and problem-solving where safety is also a priority.
Developers looking for a Llama 3.1 variant with explicit safety alignment without significant degradation in utility.

Overview

Overview

Key Capabilities

Training Methodology

Datasets Used

Good For

Full Model Card (README)