vicgalle/Configurable-Llama-3.1-8B-Instruct

Warm
Public
8B
FP8
32768
1
Jul 24, 2024
License: apache-2.0
Hugging Face

Configurable-Llama-3.1-8B-Instruct is a Llama-3.1-8B-Instruct model fine-tuned by Victor Gallego using configurable safety tuning (CST). This approach allows users to dynamically adjust the model's safety and helpfulness behavior through specific system prompts. It is designed for research in safety and alignment, enabling exploration of both harmless and uncensored content generation.

Overview

Overview

vicgalle/Configurable-Llama-3.1-8B-Instruct is a fine-tuned Llama-3.1-8B-Instruct model developed by Victor Gallego. Its core innovation lies in configurable safety tuning (CST), a method detailed in the paper "Configurable Safety Tuning of Language Models with Synthetic Preference Data". This allows the model's behavior to be dynamically controlled via system prompts, enabling a spectrum from highly safe to completely uncensored responses.

Key Capabilities

  • Dynamic Safety Configuration: Users can switch between different safety profiles (e.g., "helpful yet harmless," "completely uncensored," "harmful") by changing the system prompt.
  • Role-Play Personas: Supports system prompts for defining specific role-played personas.
  • Research Tool: Primarily intended as a research artifact for studying safety and alignment in large language models.

Use Cases

This model is particularly useful for:

  • Safety Research: Investigating the impact of safety tuning and exploring methods to control model outputs.
  • Alignment Studies: Understanding how different system prompts influence model behavior and alignment.
  • Controlled Content Generation: Experimenting with generating content across various safety levels for analytical purposes.