xinyuran/Qwen2.5-7B-RLRefine
xinyuran/Qwen2.5-7B-RLRefine is a 7.6 billion parameter, 32K context length language model developed by xinyuran, fine-tuned from Qwen2.5-7B-Instruct. It specializes in structured keyword extraction from Chinese e-commerce reviews, utilizing a three-stage reinforcement learning pipeline (SFT → DPO → GRPO). This model is optimized to provide atomic, contextually relevant keywords with a systematic five-step analysis and structured JSON output.
Loading preview...
Overview
xinyuran/Qwen2.5-7B-RLRefine is a specialized language model built upon the Qwen2.5-7B-Instruct base, developed by xinyuran. It undergoes a unique three-stage reinforcement learning (RL) pipeline: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Goal-oriented Reinforcement Learning with Policy Optimization (GRPO). This training regimen specifically targets structured keyword extraction from Chinese e-commerce reviews.
Key Capabilities & Differentiators
- Specialized Task Focus: Highly optimized for extracting keywords from Chinese e-commerce reviews, a niche application.
- Structured Output: Generates keywords in a structured JSON format, including a detailed inference process.
- Enhanced Keyword Quality: Compared to the base model, it produces strictly atomic keywords (≤4 characters), significantly reduces hallucination, and ensures comprehensive coverage.
- Systematic Analysis: Employs a five-step analytical process for extraction, moving beyond simple markdown lists.
- Robust Training: Leverages a sophisticated RL pipeline, including DPO to differentiate between correct and crude extractions, and GRPO with a schema-driven reward function (F1, format, schema, inference).
Ideal Use Cases
- E-commerce Data Analysis: Perfect for businesses needing to analyze large volumes of Chinese e-commerce reviews to identify product features, customer sentiment, and common feedback.
- Automated Tagging: Can be used to automatically tag or categorize product reviews based on extracted keywords.
- Market Research: Aids in understanding consumer language and key aspects discussed in product feedback within the Chinese market.