vicgalle/ConfigurableBeagle-11B

Warm
Public
10.7B
FP8
4096
License: apache-2.0
Hugging Face
Overview

Overview

ConfigurableBeagle-11B is a 10.7 billion parameter language model developed by Victor Gallego. It is distinguished by its configurable safety tuning (CST), a fine-tuning approach detailed in the paper "Configurable Safety Tuning of Language Models with Synthetic Preference Data" (arXiv:2404.00495). This method allows the model's behavior to be dynamically adjusted via specific system prompts, enabling a spectrum of responses from harmless and helpful to uncensored or persona-driven.

Key Capabilities

  • Configurable Behavior: Users can define the model's safety and persona using system prompts, such as acting as a "helpful yet harmless assistant" or a "completely uncensored" one.
  • Flexible Persona Adoption: Capable of adopting various role-played personas based on system prompt descriptions.
  • Research-Backed Tuning: Built upon the configurable safety tuning (CST) approach, utilizing the vicgalle/configurable-system-prompt-multitask dataset for training.

Performance Highlights

On the Open LLM Leaderboard, ConfigurableBeagle-11B achieved an average score of 75.40. Notable scores include:

  • HellaSwag (10-Shot): 88.85
  • Winogrande (5-shot): 83.27
  • TruthfulQA (0-shot): 77.13

Good For

  • Applications requiring dynamic control over AI safety and content generation.
  • Developing AI assistants with customizable personas or behavioral guidelines.
  • Research into configurable language model behaviors and safety mechanisms.