amiya/qwen2.5-3b-gec-v2

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 17, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The amiya/qwen2.5-3b-gec-v2 is a 3.1 billion parameter Qwen2.5-3B-Instruct model, fine-tuned by amiya using LoRA for English grammatical error correction (GEC). This version significantly improves upon its predecessor, achieving an F0.5 score of 0.533 on the BEA-dev dataset by adapting all 36 transformer layers and training on a mixed corpus of 42,491 minimal-edit pairs. It is specifically optimized for high-accuracy, single-pass grammatical correction of English text.

Loading preview...

Model Overview

amiya/qwen2.5-3b-gec-v2 is a LoRA fine-tune of the Qwen/Qwen2.5-3B-Instruct model, specifically designed for English grammatical error correction (GEC). This version represents a significant upgrade over its predecessor, qwen2.5-3b-gec-bea2019.

Key Improvements and Capabilities

  • Enhanced Performance: Achieves an F0.5 score of 0.533 on the held-out BEA-dev dataset, a notable improvement of +0.038 F0.5 compared to v1 (0.495).
  • Expanded Training Data: Trained on a larger, mixed corpus of 42,491 minimal-edit pairs, combining BEA-2019 W&I+LOCNESS and a subset of Grammarly Coedit GEC data.
  • Increased LoRA Adaptation: Utilizes LoRA adaptation across all 36 transformer layers (compared to 16 in v1), resulting in 14.97 million trainable parameters.
  • Optimized Training: Underwent 5,000 iterations on the expanded dataset, with the best snapshot promoted from iteration 3,500.
  • Seamless Integration: LoRA weights are fused into the base model, allowing it to be used as a drop-in replacement for the original Qwen2.5-3B-Instruct.

Recommended Usage

This model is best used for correcting grammatical errors in English text. It is recommended to use a specific system prompt: "Correct the grammar of the user text. Preserve meaning." For decoding, greedy (temperature 0) is advised, with max_new_tokens set to approximately 1.5 times the prompt length. A post-processing step can further enhance F0.5 scores by trimming tokenization artifacts.

Limitations

While highly effective, the model's ERRANT F0.5 of 0.533 is below some state-of-the-art benchmarks for GEC. It is English-only and performs best on sequences up to 256 tokens; longer inputs may degrade performance. Greedy decoding is preferred, as higher temperatures can lead to less faithful corrections.