Name: kmseong/Llama-3.1-8B-base-gsm8k-warp-lr5e-5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kmseong

Overview

This model, kmseong/Llama-3.1-8B-base-gsm8k-warp-lr5e-5, is an 8 billion parameter Llama 3.1 Instruct model fine-tuned by kmseong. It utilizes a novel Safety-First WaRP (Weight space Rotation Process), a three-phase pipeline designed to enhance safety alignment while preserving utility.

Key Capabilities

Enhanced Safety Alignment: Employs a unique WaRP method to protect safety mechanisms through gradient masking, maintaining refusal capabilities for harmful requests.
Improved Reasoning Utility: Fine-tuned on the openai/gsm8k dataset, demonstrating improved performance on utility tasks, particularly mathematical reasoning.
Balanced Safety-Utility Tradeoff: Achieves a balance between safety and performance, ensuring the model remains useful for general tasks while being robust against unsafe content generation.
Gradient-Based Neuron Protection: Identifies and protects important neurons related to safety during incremental learning, preventing degradation of safety features.

Training Details

The model's training involved three phases: Basis Construction using safety data and SVD to identify important neurons, Importance Scoring with gradient-based methods, and Incremental Learning where it was fine-tuned on GSM8K with gradient masking to protect safety-critical directions. It was trained using LibrAI/do-not-answer for safety and openai/gsm8k for utility.

Ideal Use Cases

This model is particularly well-suited for applications where both robust safety alignment and strong reasoning capabilities are critical. It can be deployed in scenarios requiring content moderation, safe AI assistants, or educational tools that need to solve mathematical problems reliably while adhering to safety guidelines.

Overview

Overview

Key Capabilities

Training Details

Ideal Use Cases

Full Model Card (README)