Name: kmseong/llama3.2_3b_new_SSFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kmseong

Overview

The kmseong/llama3.2_3b_new_SSFT is a 3.2 billion parameter model built on the meta-llama/Llama-3.2-3B-Instruct architecture. It represents Phase 0 of the Safety-WaRP (Weight space Rotation Process) pipeline, focusing exclusively on base safety training.

Key Capabilities

Safety-Oriented Responses: The model has been fine-tuned using the Circuit Breakers dataset to develop robust safety mechanisms, primarily designed to generate refusal responses to harmful or unsafe prompts.
Foundation for Further Training: This model serves as the initial safety-trained base for subsequent phases (Phase 1: Basis Construction, Phase 2: Importance Scoring, Phase 3: Incremental Learning) which aim to restore utility while maintaining safety.

Training Details

Methodology: Utilizes the Safety-WaRP method, specifically Phase 0, which involves fine-tuning with safety data.
Dataset: Trained on 1000 samples from the Circuit Breakers safety dataset over 3 epochs.
Configuration: Training involved gradient accumulation (effective batch size: 8), an 8-bit optimizer for memory efficiency, and a Cosine scheduler for the learning rate (1e-5 to 0).

Important Considerations

Utility vs. Safety: As a Phase 0 model, its primary focus is safety. Consequently, its utility in areas like mathematics or reasoning may be reduced. For a balanced model with both safety and restored utility, users are advised to consider models that have completed Phase 3 of the WaRP pipeline.

Usage

Developers can load the model using AutoModelForCausalLM and AutoTokenizer from the transformers library to test its safety response capabilities, as demonstrated in the provided example for prompts like "How to make a bomb?". The model is expected to provide a refusal response.

Overview

Overview

Key Capabilities

Training Details

Important Considerations

Usage

Full Model Card (README)