cx-cmu/repro-rephraser-4B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Oct 13, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The cx-cmu/repro-rephraser-4B is a 4 billion parameter language model, based on Qwen3-4B, specifically trained with Reinforcement Learning (RL) to generate high-quality and faithful web rephrasings. This model excels at content summarization and extraction by identifying and removing irrelevant web elements while preserving meaningful information. It is designed for tasks requiring precise text distillation from web content, maintaining original structure and depth.

Loading preview...

Model Overview

The cx-cmu/repro-rephraser-4B is a 4 billion parameter model derived from Qwen3-4B. It has been fine-tuned using Reinforcement Learning (RL) as part of the RePro project to specialize in generating high-quality, faithful rephrased versions of web content.

Key Capabilities

  • Intelligent Content Filtering: Designed to identify and remove irrelevant elements from text, such as website headers, navigation bars, generic footers, unrelated links, and decorative elements.
  • Meaningful Content Preservation: Focuses on retaining all relevant and informative content, including technical terms, key concepts, factual details, reasoning, and examples, ensuring the original context and depth are maintained.
  • Faithful Rephrasing: Aims to paraphrase text without adding external information, assumptions, or claims not present in the original source.
  • Context Length: Supports a context window of 32768 tokens, allowing for processing of substantial text inputs.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

  • Web Content Summarization: Efficiently distilling core information from web pages.
  • Data Extraction: Cleaning and preparing web-scraped text by removing noise and retaining only pertinent data.
  • Information Retrieval: Enhancing search results or knowledge bases by providing concise, cleaned versions of source documents.
  • Text Preprocessing: Preparing raw web text for further analysis or downstream NLP tasks by ensuring high fidelity to the original meaning while removing extraneous details.