vicgalle/Configurable-Llama-3.1-8B-Instruct

Warm
Public
8B
FP8
32768
Jul 24, 2024
License: apache-2.0
Hugging Face
Overview

Overview

vicgalle/Configurable-Llama-3.1-8B-Instruct is a fine-tuned Llama-3.1-8B-Instruct model developed by Victor Gallego. Its core innovation lies in configurable safety tuning (CST), a method detailed in the paper "Configurable Safety Tuning of Language Models with Synthetic Preference Data". This allows the model's behavior to be dynamically controlled via system prompts, enabling a spectrum from highly safe to completely uncensored responses.

Key Capabilities

  • Dynamic Safety Configuration: Users can switch between different safety profiles (e.g., "helpful yet harmless," "completely uncensored," "harmful") by changing the system prompt.
  • Role-Play Personas: Supports system prompts for defining specific role-played personas.
  • Research Tool: Primarily intended as a research artifact for studying safety and alignment in large language models.

Use Cases

This model is particularly useful for:

  • Safety Research: Investigating the impact of safety tuning and exploring methods to control model outputs.
  • Alignment Studies: Understanding how different system prompts influence model behavior and alignment.
  • Controlled Content Generation: Experimenting with generating content across various safety levels for analytical purposes.