cais/HarmBench-Mistral-7b-val-cls
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 17, 2024License:mitArchitecture:Transformer0.0K Open Weights Cold

The cais/HarmBench-Mistral-7b-val-cls is a 7 billion parameter Mistral-based classifier developed by the Center for AI Safety (CAIS) with a 4096-token context length. It serves as the official validation classifier for behaviors within the HarmBench framework, designed to identify harmful or undesirable outputs from large language models. This model specializes in classifying standard, contextual, and multimodal behaviors, achieving high agreement rates with human judgments, comparable to GPT-4.

Loading preview...