pymlex/Qwen2.5-0.5B-Human
pymlex/Qwen2.5-0.5B-Human is a 0.5 billion parameter language model, DPO fine-tuned from Qwen/Qwen2.5-0.5B-Instruct with a 32K context length. Developed by pymlex, its primary differentiator is its optimization for paraphrasing Spanish academic abstracts to reduce AI-detection scores from the danibor/oculus-v2.0-multilingual detector. This model is specifically designed to generate text that is less likely to be flagged as AI-generated by a particular detector.
Loading preview...
Overview
pymlex/Qwen2.5-0.5B-Human is a 0.5 billion parameter language model, fine-tuned using Direct Preference Optimization (DPO) from the Qwen/Qwen2.5-0.5B-Instruct base model. Its core purpose is to paraphrase Spanish academic abstracts in a way that reduces their detectability by the danibor/oculus-v2.0-multilingual AI text detector. The model was trained on preference pairs derived from pymlex/ai-generated-texts and the Flaglab/academic-knowledge-abstracts-es corpus.
Key Capabilities
- AI-Detection Score Reduction: Specifically optimized to produce text with lower AI-detection probabilities when evaluated by the Oculus detector.
- Spanish Abstract Paraphrasing: Excels at rephrasing academic abstracts written in Spanish.
- DPO Fine-tuning: Utilizes DPO with a beta of 0.1 to increase the log-probability of completions that yield lower Oculus scores.
Performance Highlights
During DPO, the mean validation AI probability on a 276-text subset decreased significantly from 0.6740 to 0.2437. Post-training evaluation showed the fine-tuned model achieved a mean Oculus probability of 0.2391 on the test set, compared to 0.6532 for the base model. This indicates a substantial reduction in perceived AI-generated likelihood.
Good For
- Researchers or academics working with Spanish abstracts who need to generate paraphrases that are less likely to be identified as AI-generated by the Oculus detector.
- Experiments and research into methods for reducing AI text detection.
Limitations
- Specific Detector Target: Optimization is specifically against the Oculus detector; performance against other AI detectors is not guaranteed.
- Domain Specificity: Primarily focused on Spanish academic abstracts; performance on other text types or languages may vary.