Model Overview
vicgalle/Configurable-Hermes-2-Pro-Llama-3-8B is an 8 billion parameter language model based on NousResearch/Hermes-2-Pro-Llama-3-8B. Developed by Victor Gallego, this model introduces a novel approach called Configurable Safety Tuning (CST), detailed in the paper "Configurable Safety Tuning of Language Models with Synthetic Preference Data" (arXiv:2404.00495).
Key Capabilities
- Dynamic Behavior Control: Users can configure the model's safety and helpfulness levels at inference time using specific system prompts. This allows for a spectrum of behaviors, from strictly harmless to completely uncensored or even intentionally harmful.
- Research Focus: The model serves as a research artifact to explore safety and alignment in large language models, demonstrating the impact of configurable system prompts on output generation.
- Flexible System Prompts: Supports various system prompts, including those for helpful/harmless, uncensored, harmful, unbiased, or role-played personas.
Use Cases
- AI Safety Research: Ideal for researchers studying methods to control and understand LLM safety, bias, and alignment.
- Prototyping: Useful for experimenting with different moderation policies or persona-based interactions without retraining the model.
Disclaimer
It is important to note that this model can be configured to generate harmful or offensive material. Its public availability is strictly for research purposes in the fields of safety and alignment.