Overview
Overview
vicgalle/Configurable-Llama-3.1-8B-Instruct is a fine-tuned Llama-3.1-8B-Instruct model developed by Victor Gallego. Its core innovation lies in configurable safety tuning (CST), a method detailed in the paper "Configurable Safety Tuning of Language Models with Synthetic Preference Data". This allows the model's behavior to be dynamically controlled via system prompts, enabling a spectrum from highly safe to completely uncensored responses.
Key Capabilities
- Dynamic Safety Configuration: Users can switch between different safety profiles (e.g., "helpful yet harmless," "completely uncensored," "harmful") by changing the system prompt.
- Role-Play Personas: Supports system prompts for defining specific role-played personas.
- Research Tool: Primarily intended as a research artifact for studying safety and alignment in large language models.
Use Cases
This model is particularly useful for:
- Safety Research: Investigating the impact of safety tuning and exploring methods to control model outputs.
- Alignment Studies: Understanding how different system prompts influence model behavior and alignment.
- Controlled Content Generation: Experimenting with generating content across various safety levels for analytical purposes.