viswavi/qwen2.5_rlcf is a 7.6 billion parameter language model developed by viswavi, fine-tuned from Qwen-2.5-7B-Instruct with a 131,072 token context length. This model utilizes preference tuning on the WildChecklists dataset to significantly enhance its instruction following capabilities, particularly for complex and subjective instructions. It excels in benchmarks like InfoBench and FollowBench, showing marked improvements over its base model in instruction adherence.
Loading preview...
Overview
viswavi/qwen2.5_rlcf is a 7.6 billion parameter language model built upon Qwen-2.5-7B-Instruct, specifically enhanced for superior instruction following. Developed by researchers at Carnegie Mellon University, this model leverages a novel preference tuning approach using the WildChecklists dataset. The methodology is detailed in the paper "Checklists Are Better Than Reward Models For Aligning Language Models" (2025).
Key Capabilities
- Improved Complex Instruction Following: Demonstrates significant gains in adhering to intricate and subjective instructions.
- Enhanced Performance on Benchmarks: Outperforms the base Qwen-2.5-7B-Instruct model across various instruction-following metrics, including InfoBench (Overall: 84.1 vs 78.1) and FollowBench (Hard Avg: 75.3 vs 71.4).
- Robustness: While primarily focused on instruction following, it maintains comparable performance on other tasks like math reasoning, with minor adjustments to safety alignment behavior.
Good For
- Applications requiring precise and nuanced instruction adherence.
- Scenarios where models need to follow complex, multi-step, or subjective prompts accurately.
- Researchers and developers looking for a model with strong instruction-following capabilities, particularly those interested in preference tuning techniques beyond traditional reward models.