xinyuran/Qwen2.5-7B-RLRefine

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:May 9, 2026License:mitArchitecture:Transformer Open Weights Warm

xinyuran/Qwen2.5-7B-RLRefine is a 7.6 billion parameter, 32K context length language model developed by xinyuran, fine-tuned from Qwen2.5-7B-Instruct. It specializes in structured keyword extraction from Chinese e-commerce reviews, utilizing a three-stage reinforcement learning pipeline (SFT → DPO → GRPO). This model is optimized to provide atomic, contextually relevant keywords with a systematic five-step analysis and structured JSON output.

Loading preview...

Overview

xinyuran/Qwen2.5-7B-RLRefine is a specialized language model built upon the Qwen2.5-7B-Instruct base, developed by xinyuran. It undergoes a unique three-stage reinforcement learning (RL) pipeline: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Goal-oriented Reinforcement Learning with Policy Optimization (GRPO). This training regimen specifically targets structured keyword extraction from Chinese e-commerce reviews.

Key Capabilities & Differentiators

  • Specialized Task Focus: Highly optimized for extracting keywords from Chinese e-commerce reviews, a niche application.
  • Structured Output: Generates keywords in a structured JSON format, including a detailed inference process.
  • Enhanced Keyword Quality: Compared to the base model, it produces strictly atomic keywords (≤4 characters), significantly reduces hallucination, and ensures comprehensive coverage.
  • Systematic Analysis: Employs a five-step analytical process for extraction, moving beyond simple markdown lists.
  • Robust Training: Leverages a sophisticated RL pipeline, including DPO to differentiate between correct and crude extractions, and GRPO with a schema-driven reward function (F1, format, schema, inference).

Ideal Use Cases

  • E-commerce Data Analysis: Perfect for businesses needing to analyze large volumes of Chinese e-commerce reviews to identify product features, customer sentiment, and common feedback.
  • Automated Tagging: Can be used to automatically tag or categorize product reviews based on extracted keywords.
  • Market Research: Aids in understanding consumer language and key aspects discussed in product feedback within the Chinese market.