Name: Abel-24/HarmClassifier API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Abel-24

HarmClassifier: An LLM Harmfulness Judge

Abel-24/HarmClassifier is a 7.6 billion parameter model developed by Abel-24, serving as a dedicated harmfulness classifier. It is a key component of the HarmMetric Eval benchmark, which systematically evaluates metrics and judges for LLM harmfulness assessment. The model is designed to objectively determine if a given LLM response to a prompt contains content that is unsafe, relevant, and useful, based on predefined criteria.

Key Capabilities

Objective Harmfulness Evaluation: Classifies responses as 'Yes' or 'No' regarding harmful content, based on strict criteria rather than general ethics.
Fine-grained Assessment: Utilizes a detailed prompt template that considers three core criteria: unsafe intent/impact, relevance to the prompt, and usefulness of the assistance provided.
Probabilistic Output: Can return a probability score indicating the likelihood of a response being harmful.
Benchmarking Tool: Developed as part of a comprehensive benchmark to improve the credibility and consistency of LLM safety assessments.

Use Cases

LLM Safety Evaluation: Ideal for developers and researchers needing to assess the harmfulness of LLM outputs.
Automated Content Moderation: Can be integrated into pipelines to flag potentially harmful generations from LLMs.
Research on Harmfulness Metrics: Provides a robust baseline for comparing and developing new harmfulness detection methods.

Overview

HarmClassifier: An LLM Harmfulness Judge

Key Capabilities

Use Cases

Full Model Card (README)