Name: kmseong/llama2_7b_chat-WaRP-original-space-gsm8k-lr5e-5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kmseong

Overview

This model, kmseong/llama2_7b_chat-WaRP-original-space-gsm8k-lr5e-5, is a 7 billion parameter Llama 3.1 Instruct variant developed by Min-Seong Kim. It has been fine-tuned using a novel Safety-First WaRP (Weight space Rotation Process), a three-phase pipeline designed to enhance safety alignment without significantly compromising utility.

Key Capabilities

Enhanced Safety Alignment: Utilizes a unique WaRP method to protect safety mechanisms, ensuring the model maintains refusal capabilities for harmful requests.
Improved Reasoning Utility: Fine-tuned on the GSM8K dataset, demonstrating improved performance on mathematical reasoning tasks.
Balanced Safety-Utility Tradeoff: Achieves a balance between safety and performance by employing gradient masking during incremental learning to preserve important safety-related directions.
Gradient-based Neuron Importance: Identifies and protects critical neurons (e.g., 419 neurons in layer 31) related to safety during the fine-tuning process.

Good For

Applications requiring a safety-aligned LLM that can effectively handle harmful prompts.
Use cases where mathematical reasoning and problem-solving are important, alongside safety.
Developers looking for a model that has undergone a structured process to balance ethical considerations with performance.

This model was trained using safety data from LibrAI/do-not-answer and utility data from openai/gsm8k, building upon the meta-llama/Llama-3.1-8B-Instruct base model.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)