cais/HarmBench-Llama-2-13b-cls-multimodal-behaviors

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Feb 27, 2024License:mitArchitecture:Transformer0.0K Open Weights Warm

The cais/HarmBench-Llama-2-13b-cls-multimodal-behaviors is a 13 billion parameter Llama-2 based classifier developed by the Center for AI Safety (CAIS). This model is specifically designed to identify multimodal harmful behaviors within the HarmBench evaluation framework, supporting a context length of 4096 tokens. It serves as the official classifier for multimodal red teaming scenarios, determining if a generated response constitutes a harmful instance given a specific behavior and context.

Loading preview...

Overview

This model, cais/HarmBench-Llama-2-13b-cls-multimodal-behaviors, is the official 13 billion parameter classifier for multimodal behaviors within the HarmBench evaluation framework. Developed by the Center for AI Safety (CAIS), its primary function is to determine whether a given LLM generation exhibits a specified harmful behavior in a multimodal context.

Key Capabilities

  • Multimodal Harm Classification: Specializes in classifying harmful behaviors that involve multimodal inputs, such as image descriptions.
  • Red Teaming Support: Designed to support automated red teaming efforts by providing a standardized method for evaluating LLM safety.
  • Contextual Analysis: Utilizes a detailed prompt template that incorporates context, behavior, and generation to make precise classifications.

Usage and Application

This classifier is intended for researchers and developers working on LLM safety and red teaming. It helps in systematically identifying and categorizing instances of harmful content generated by LLMs, particularly when multimodal information is involved. An example notebook is provided for practical implementation, demonstrating how to format inputs and interpret outputs. The model outputs a simple "yes" or "no" indicating the presence of the specified harmful behavior, adhering to a strict set of rules for unambiguous and non-minimal instances.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p