Overview
Overview
ConfigurableBeagle-11B is a 10.7 billion parameter language model developed by Victor Gallego. It is distinguished by its configurable safety tuning (CST), a fine-tuning approach detailed in the paper "Configurable Safety Tuning of Language Models with Synthetic Preference Data" (arXiv:2404.00495). This method allows the model's behavior to be dynamically adjusted via specific system prompts, enabling a spectrum of responses from harmless and helpful to uncensored or persona-driven.
Key Capabilities
- Configurable Behavior: Users can define the model's safety and persona using system prompts, such as acting as a "helpful yet harmless assistant" or a "completely uncensored" one.
- Flexible Persona Adoption: Capable of adopting various role-played personas based on system prompt descriptions.
- Research-Backed Tuning: Built upon the configurable safety tuning (CST) approach, utilizing the
vicgalle/configurable-system-prompt-multitaskdataset for training.
Performance Highlights
On the Open LLM Leaderboard, ConfigurableBeagle-11B achieved an average score of 75.40. Notable scores include:
- HellaSwag (10-Shot): 88.85
- Winogrande (5-shot): 83.27
- TruthfulQA (0-shot): 77.13
Good For
- Applications requiring dynamic control over AI safety and content generation.
- Developing AI assistants with customizable personas or behavioral guidelines.
- Research into configurable language model behaviors and safety mechanisms.