theblackcat102/Qwen3-1.7B-Usefulness-Judge

Warm
Public
2B
BF16
40960
Hugging Face
Overview

Overview

The theblackcat102/Qwen3-1.7B-Usefulness-Judge is a specialized 2 billion parameter model built on the Qwen3 architecture. Its primary function is to act as a "usefulness judge," assessing whether a given response effectively answers a specific question or merely avoids it. This model is particularly useful for automated content evaluation and quality control systems.

Key Capabilities

  • Response Usefulness Prediction: Determines if a response is useful to a question, offering both direct and reasoning-based evaluations.
  • Reasoning Mode: Provides a detailed reasoning process before delivering a 'YES' or 'NO' verdict on usefulness, achieving an average F1 score of 0.8248.
  • Direct Answer Mode: Offers a straightforward 'YES' or 'NO' verdict, with an accuracy of 86.44% and an F1 score of 0.7681.
  • Robust Performance: Demonstrates consistent performance across multiple evaluations, with a low standard deviation in its F1 scores.

Good For

  • Automated Content Moderation: Filtering out unhelpful or evasive responses in chatbots or Q&A systems.
  • Response Quality Assurance: Automatically evaluating the relevance and directness of AI-generated or human-generated text.
  • Feedback Systems: Providing programmatic feedback on the utility of conversational AI outputs.
  • Benchmarking: Serving as a metric for evaluating the helpfulness of other language models' responses.