rrivera1849/style-aware-paraphraser-mistral7b
The rrivera1849/style-aware-paraphraser-mistral7b is a 7 billion parameter Mistral-7B-Instruct-v0.3 derivative, fine-tuned by rrivera1849 with a 4096 token context length. It specializes in style-aware paraphrasing of machine-generated text to mimic a target human author's style, making it harder for machine-text detectors to identify. This model was developed for research into the limits of machine-text detection and is particularly effective on social-media-like text.
Loading preview...
Overview
This model, rrivera1849/style-aware-paraphraser-mistral7b, is a 7 billion parameter Mistral-7B-Instruct-v0.3 derivative, fine-tuned through two stages: Supervised Fine-Tuning (SFT) and Detector-guided Direct Preference Optimization (DPO). Its primary function is to adversarially paraphrase machine-generated text, transforming it to match the stylistic fingerprints of a specified human author, thereby evading machine-text detectors. The training utilized the Reddit Million Users Dataset, focusing on comments between 32 and 128 tokens.
Key Capabilities
- Style-Aware Paraphrasing: Transforms machine-generated text into a target human author's style, requiring 16 author exemplars.
- Adversarial Evasion: Designed to make machine-generated text undetectable by current machine-text detectors, achieving a maximum AUROC(1) of ≈ 0.55 against a suite of nine detectors.
- Iterative Refinement: Employs a multi-stage inference process involving initial Mistral-7B paraphrases followed by iterative refinement using this model and SBERT for selection.
Good For
- Research on Machine-Text Detection: Ideal for stress-testing detectors, studying stylistic feature retention, and developing new detection defenses.
- Social Media Text: Performs best on short-form, social-media-like text (e.g., Reddit comments, Amazon reviews).
- Academic Research: Supports the findings of the paper "Attacks on Machine-Text Detectors Retain Stylistic Fingerprints" (ICML 2026).