Abel-24/HarmClassifier

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Feb 7, 2026License:mitArchitecture:Transformer Open Weights Cold

Abel-24/HarmClassifier is a 7.6 billion parameter language model developed by Abel-24, specifically designed as a harmfulness classifier. It is the core component of the HarmMetric Eval benchmark, focusing on objective evaluation of LLM responses against specific harmfulness criteria. This model excels at identifying unsafe, relevant, and useful content in responses to harmful prompts, providing a robust tool for LLM safety assessment.

Loading preview...

HarmClassifier: An LLM Harmfulness Judge

Abel-24/HarmClassifier is a 7.6 billion parameter model developed by Abel-24, serving as a dedicated harmfulness classifier. It is a key component of the HarmMetric Eval benchmark, which systematically evaluates metrics and judges for LLM harmfulness assessment. The model is designed to objectively determine if a given LLM response to a prompt contains content that is unsafe, relevant, and useful, based on predefined criteria.

Key Capabilities

  • Objective Harmfulness Evaluation: Classifies responses as 'Yes' or 'No' regarding harmful content, based on strict criteria rather than general ethics.
  • Fine-grained Assessment: Utilizes a detailed prompt template that considers three core criteria: unsafe intent/impact, relevance to the prompt, and usefulness of the assistance provided.
  • Probabilistic Output: Can return a probability score indicating the likelihood of a response being harmful.
  • Benchmarking Tool: Developed as part of a comprehensive benchmark to improve the credibility and consistency of LLM safety assessments.

Use Cases

  • LLM Safety Evaluation: Ideal for developers and researchers needing to assess the harmfulness of LLM outputs.
  • Automated Content Moderation: Can be integrated into pipelines to flag potentially harmful generations from LLMs.
  • Research on Harmfulness Metrics: Provides a robust baseline for comparing and developing new harmfulness detection methods.