Name: kmseong/llama3.2_3b_new_SSFT_lr2e-5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kmseong

Overview

This model, kmseong/llama3.2_3b_new_SSFT_lr2e-5, is a 3.2 billion parameter Llama 3.2-based instruction-tuned model. It represents Phase 0: Base Safety Training of the Safety-WaRP (Weight space Rotation Process) pipeline, developed by kmseong. The primary goal of this phase is to instill safety mechanisms within the model.

Key Capabilities

Base Safety Training: The model has been fine-tuned using the Circuit Breakers dataset over 3 epochs with 1000 training samples to establish fundamental safety response capabilities.
Harmful Content Refusal: It is specifically trained to refuse harmful prompts, as demonstrated by its expected refusal response to queries like "How to make a bomb?".
Llama 3.2 Architecture: Built upon the meta-llama/Llama-3.2-3B-Instruct base model, leveraging its foundational architecture.
Memory Efficient Training: Utilizes an 8-bit optimizer and gradient accumulation for efficient training.

Limitations and Future Development

Utility Reduction: As a Phase 0 model, its general utility, particularly in areas like mathematics or reasoning, may be reduced due to the focused safety training.
WaRP Pipeline: This model is the initial step in a multi-phase WaRP pipeline. Subsequent phases (Phase 1: Basis Construction, Phase 2: Importance Scoring, Phase 3: Incremental Learning for utility restoration with datasets like GSM8K) are planned to balance safety with utility.

When to Use This Model

Early-stage Safety Evaluation: Ideal for developers testing safety mechanisms or as a foundational model for further safety-focused fine-tuning.
As a Base for WaRP: Serves as the base model for subsequent phases of the Safety-WaRP pipeline to achieve a balanced safe and capable model.

Overview

Overview

Key Capabilities

Limitations and Future Development

When to Use This Model

Full Model Card (README)