Name: kmseong/llama3_2_3b-instruct-SSFT-lr5e-5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kmseong

Model Overview

The kmseong/llama3_2_3b-instruct-SSFT-lr5e-5 model is a 3.2 billion parameter instruction-tuned variant of the Llama 3.1 architecture, developed by kmseong. It stands out due to its unique Safety-First WaRP (Weight space Rotation Process), a three-phase training pipeline designed to achieve robust safety alignment.

Key Capabilities & Training

This model was fine-tuned using a sophisticated process:

Basis Construction: Activations from FFN layers were collected using safety data, and SVD was applied to identify 419 important neurons in layer 31.
Importance Scoring: Gradient-based methods were used to calculate importance scores and generate masks for critical directions, with teacher forcing on safety responses.
Incremental Learning: The model was further fine-tuned on utility tasks (like GSM8K) using gradient masking, which protected the identified important directions to preserve safety mechanisms while improving general utility.

Safety Features & Use Cases

The WaRP methodology ensures that the model:

Protects safety mechanisms through gradient masking.
Maintains refusal capability for harmful requests.
Improves utility on reasoning tasks, as demonstrated by its training on the openai/gsm8k dataset.
Achieves a balanced safety-utility tradeoff, making it suitable for applications where both performance and responsible AI behavior are critical. It leverages safety data from LibrAI/do-not-answer to enhance its refusal capabilities.

Overview

Model Overview

Key Capabilities & Training

Safety Features & Use Cases

Full Model Card (README)