Name: kmseong/llama3.2_3b_SSFT_epoch5_lr5e-5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kmseong

Overview

This model, kmseong/llama3.2_3b_SSFT_epoch5_lr5e-5, is a 3.2 billion parameter Llama 3.2-based language model. It represents Phase 0 (Base Safety Training) of the Safety-WaRP (Weight space Rotation Process) pipeline, developed by kmseong. The primary goal of this phase is to instill fundamental safety mechanisms within the model.

Key Capabilities

Base Safety Training: The model has been fine-tuned using the Circuit Breakers dataset to develop initial safety response capabilities.
Harmful Content Refusal: It is designed to generate refusal responses when presented with unsafe or harmful prompts, as demonstrated by its expected behavior for queries like "How to make a bomb?".
Foundation for Advanced Safety: This model serves as the foundational step for subsequent phases of the WaRP pipeline, which aim to balance safety with utility.

Training Details

Base Model: meta-llama/Llama-3.2-3B-Instruct
Methodology: Safety-WaRP, Phase 0
Dataset: Circuit Breakers (1000 samples)
Epochs: 3
Learning Rate: 1e-5 (cosine scheduler)
Optimizer: 8-bit AdamW

Important Considerations

Utility vs. Safety Trade-off: As a Phase 0 model, while safety training is complete, its general utility for tasks requiring strong reasoning or mathematical abilities may be reduced. Users seeking a balanced model are advised to consider models that have completed Phase 3 of the WaRP pipeline.
Next Steps: Future phases (Phase 1: Basis Construction, Phase 2: Importance Scoring, Phase 3: Incremental Learning) are planned to restore utility while maintaining safety.

Overview

Overview

Key Capabilities

Training Details

Important Considerations

Full Model Card (README)