Name: selfhypnosis-ai/Qwen3.5-4B-Creative-Writing-Judge API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: selfhypnosis-ai

selfhypnosis-ai/Qwen3.5-4B-Creative-Writing-Judge: A Specialized LLM for Creative Writing Evaluation

This model, developed by selfhypnosis-ai, is a specialized 4.5 billion parameter LLM-as-a-Judge built on the Qwen 3.5 architecture. It is uniquely designed for evaluating and ranking creative writing responses by assessing pairs of texts for accuracy, clarity, and originality. Unlike traditional LLMs that generate text, this model's primary function is to act as an automated judge, providing a quantitative preference score between two given responses.

Key Capabilities

LLM-as-a-Judge for Creative Writing: Optimized for pairwise preference evaluation of creative texts.
Logit-Based Evaluation: Utilizes underlying token log probabilities of 'A' and 'B' to determine confidence scores, rather than simple text generation, for more robust and consistent judgments.
Bias Mitigation: Employs a dual-pass inference strategy (swapping response positions) to effectively reduce positional bias, a common issue in LLM judges.
High Discriminative Ability: Achieves a Combined Separability Score of 87.92 / 100.0 in prompt-isolated Elo evaluations, indicating strong ability to differentiate writing quality.
Leaderboard Stability: Demonstrates extreme stability in tournament rankings with an Omega-Squared of 0.9020 in bootstrapped evaluations, meaning rankings are highly consistent across different prompt selections.
32K Context Length: Supports a fine-tuned context length of 32,768 tokens, allowing for evaluation of longer creative pieces.

Good for

Automated evaluation and ranking of creative writing outputs from other LLMs or human writers.
Developing and maintaining leaderboards for creative writing models.
Research into LLM-as-a-Judge methodologies, particularly for subjective tasks.
Applications requiring objective, bias-mitigated comparison of creative text quality.

Overview

selfhypnosis-ai/Qwen3.5-4B-Creative-Writing-Judge: A Specialized LLM for Creative Writing Evaluation

Key Capabilities

Good for

Full Model Card (README)