declare-lab/starling-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Aug 18, 2023License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Starling-7B is a 7 billion parameter language model developed by declare-lab, fine-tuned from Vicuna-7B. It is specifically designed for safety alignment, utilizing the ChatGPT-distilled HarmfulQA dataset collected via the Chain of Utterances (CoU) prompt. This model demonstrates improved safety performance, showing a reduction in Attack Success Rate on safety benchmarks and an improvement in HHH scores compared to its Vicuna baseline. Starling-7B is optimized for generating safer responses, making it suitable for applications requiring robust content moderation and reduced harmful outputs.

Loading preview...

Starling-7B: A Safety-Aligned Language Model

Starling-7B, developed by declare-lab, is a 7 billion parameter model fine-tuned from Vicuna-7B with a 4096-token context length. Its primary innovation lies in its safety alignment, achieved by training on the HarmfulQA dataset. This dataset, distilled from ChatGPT using the Chain of Utterances (CoU) prompt, focuses on identifying and mitigating harmful content.

Key Capabilities & Performance

  • Enhanced Safety: Experimental results indicate a significant improvement in safety compared to the Vicuna baseline. Starling-7B shows an average 5.2% reduction in Attack Success Rate (ASR) on DangerousQA and HarmfulQA datasets.
  • Improved HHH Scores: The model demonstrates an average 3-7% improvement in HHH (Helpful, Harmless, Honest) scores on the BBH-HHH benchmark.
  • Reasoning & Knowledge: While primarily focused on safety, Starling-7B maintains competitive performance on general benchmarks, scoring 48.90 on TruthfulQA (MC2) and 46.69 on MMLU (5-shot), comparable to or slightly above its Vicuna base.
  • Red-Teaming Resource: The research also introduces the HarmfulQA dataset, comprising 1,960 harmful questions, which is a valuable resource for red-teaming and safety alignment efforts.

When to Use Starling-7B

Starling-7B is particularly well-suited for applications where robust safety and reduced generation of harmful content are critical. This includes:

  • Content Moderation: Filtering or flagging potentially unsafe user inputs or model outputs.
  • Safe AI Assistants: Developing chatbots or virtual assistants that prioritize harmless and ethical responses.
  • Research in AI Safety: As a baseline or tool for further exploration into safety alignment techniques and red-teaming methodologies.