Name: kmseong/Llama-3.1-8B-base-gsm8k-safeinstr-lr5e-5-ratio0.1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kmseong

Overview

This model, kmseong/Llama-3.1-8B-base-gsm8k-safeinstr-lr5e-5-ratio0.1, is an 8 billion parameter instruction-tuned variant of Meta's Llama 3.1. Developed by kmseong, its primary distinction lies in its safety alignment achieved through a novel Safety-First WaRP (Weight space Rotation Process) three-phase training pipeline. This method focuses on protecting safety mechanisms while simultaneously improving performance on utility tasks.

Key Capabilities

Enhanced Safety Alignment: Utilizes a sophisticated WaRP process to maintain strong refusal capabilities for harmful requests.
Improved Reasoning Utility: Fine-tuned on the GSM8K dataset, demonstrating improved performance on mathematical and general reasoning tasks.
Balanced Safety-Utility Tradeoff: Designed to prevent degradation of utility performance while enforcing safety, achieved through gradient masking during incremental learning.
Robust Training Methodology: Involves basis construction from safety data, importance scoring using gradient-based methods, and incremental learning with protected directions.

Good For

Applications requiring a strong balance between content safety and general reasoning capabilities.
Use cases where refusal of harmful prompts is critical, such as chatbots or content moderation systems.
Tasks involving mathematical problem-solving or logical reasoning, benefiting from its GSM8K fine-tuning.

This model is built upon the meta-llama/Llama-3.1-8B-Instruct base and incorporates safety data from LibrAI/do-not-answer and utility data from openai/gsm8k.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)