LLM360/AmberSafe

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 15, 2023License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

LLM360/AmberSafe is a 7 billion parameter, English-language instruction model developed by LLM360, built upon the LLaMA-7B architecture and safety-finetuned using LLM360/AmberChat as its base. This model is specifically optimized for generating safe and helpful responses, distinguishing it from general-purpose LLMs by its focus on safety alignment. It utilizes a direct preference optimization (DPO) method on a filtered dataset to ensure chosen responses are safe and rejected ones are unsafe, making it suitable for applications requiring robust content moderation and responsible AI interactions.

Loading preview...

AmberSafe: A Safety-Aligned Instruction Model

AmberSafe is a 7 billion parameter instruction-tuned language model developed by LLM360, part of their Pebble model series. It is built on the LLaMA-7B architecture and uses LLM360/AmberChat as its base model. The primary differentiator for AmberSafe is its dedicated safety finetuning, making it particularly suitable for applications where generating safe and appropriate content is paramount.

Key Capabilities

  • Safety-Focused Responses: Specifically trained to provide safe and helpful answers, minimizing the generation of harmful or undesirable content.
  • Instruction Following: Capable of understanding and executing instructions, leveraging its base as an instruction model.
  • English Language Support: Optimized for processing and generating text in English.

Finetuning Details

AmberSafe was finetuned using a Direct Preference Optimization (DPO) method. The training involved the PKU-Alignment/PKU-SafeRLHF dataset, which was carefully filtered to ensure that for each preference pair, the chosen response was safe and the rejected one was unsafe. This rigorous process enhances the model's ability to discern and produce safe outputs.

Evaluation

On the MT-Bench, AmberSafe achieved a score of 4.725000, demonstrating its improved safety alignment compared to its base model, LLM360/AmberChat (5.428125), and the earlier LLM360/Amber 359 (2.48750). The slight reduction in MT-Bench score compared to AmberChat indicates a trade-off for enhanced safety.

Good For

  • Applications requiring robust content moderation.
  • Chatbots and virtual assistants where safety and politeness are critical.
  • Generating responses in sensitive domains where harmful content must be avoided.
  • Developers looking for a LLaMA-7B-based model with explicit safety alignment.