Name: kmseong/Llama-3.1-8B-base-gsm8k-SSFT_lr1e-5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kmseong

Overview

This model, kmseong/Llama-3.1-8B-base-gsm8k-SSFT_lr1e-5, is an 8 billion parameter Llama 3.1 Instruct model fine-tuned by kmseong. Its core innovation lies in the application of a Safety-First Weight space Rotation Process (WaRP), a three-phase training pipeline designed to achieve a robust balance between safety alignment and utility.

Key Capabilities

Enhanced Safety Alignment: Utilizes a novel WaRP method to construct an orthonormal basis from safety data, identify important neurons, and apply gradient masking during fine-tuning. This process protects safety mechanisms and maintains refusal capabilities for harmful requests.
Improved Reasoning Utility: While prioritizing safety, the model also undergoes incremental learning on utility tasks, specifically fine-tuned on the openai/gsm8k dataset. This improves its performance on mathematical reasoning tasks.
Balanced Safety-Utility Tradeoff: The WaRP method ensures that utility improvements do not compromise the model's safety features, offering a model that is both safer and more capable in specific reasoning domains.

Training Details

The training involved three phases:

Basis Construction: Collected activations from FFN layers using LibrAI/do-not-answer safety data and computed SVD to identify important neurons.
Importance Scoring: Calculated gradient-based importance scores and generated masks for critical directions.
Incremental Learning: Fine-tuned on the openai/gsm8k utility task with gradient masking to protect identified safety-critical directions.

Good For

Applications requiring a strong emphasis on safety and refusal of harmful content.
Use cases where mathematical and general reasoning capabilities are important, particularly those benefiting from GSM8K-like problem-solving.
Developers looking for a Llama 3.1 8B Instruct model with explicit safety alignment without significant degradation in utility.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)