Name: LLM360/AmberSafe API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: LLM360

AmberSafe: A Safety-Aligned Instruction Model

AmberSafe is a 7 billion parameter instruction-tuned language model developed by LLM360, part of their Pebble model series. It is built on the LLaMA-7B architecture and uses LLM360/AmberChat as its base model. The primary differentiator for AmberSafe is its dedicated safety finetuning, making it particularly suitable for applications where generating safe and appropriate content is paramount.

Key Capabilities

Safety-Focused Responses: Specifically trained to provide safe and helpful answers, minimizing the generation of harmful or undesirable content.
Instruction Following: Capable of understanding and executing instructions, leveraging its base as an instruction model.
English Language Support: Optimized for processing and generating text in English.

Finetuning Details

AmberSafe was finetuned using a Direct Preference Optimization (DPO) method. The training involved the PKU-Alignment/PKU-SafeRLHF dataset, which was carefully filtered to ensure that for each preference pair, the chosen response was safe and the rejected one was unsafe. This rigorous process enhances the model's ability to discern and produce safe outputs.

Evaluation

On the MT-Bench, AmberSafe achieved a score of 4.725000, demonstrating its improved safety alignment compared to its base model, LLM360/AmberChat (5.428125), and the earlier LLM360/Amber 359 (2.48750). The slight reduction in MT-Bench score compared to AmberChat indicates a trade-off for enhanced safety.

Good For

Applications requiring robust content moderation.
Chatbots and virtual assistants where safety and politeness are critical.
Generating responses in sensitive domains where harmful content must be avoided.
Developers looking for a LLaMA-7B-based model with explicit safety alignment.

Overview

AmberSafe: A Safety-Aligned Instruction Model

Key Capabilities

Finetuning Details

Evaluation

Good For

Full Model Card (README)