Model Overview
kxdw2580/Qwen3-1.7B-Catgirl-test0430 is a 1.7 billion parameter model based on the Qwen3 architecture, developed by kxdw2580 as a comparative test against Qwen2.5. This model, despite being a test version, demonstrates usable baseline performance. It was fine-tuned twice, with detailed logs available on SwanLab.
Key Characteristics & Findings
- Qwen3 Architecture Exploration: This model investigates the newly released Qwen3's native reasoning capabilities and how they compare to Qwen2.5 after fine-tuning.
- Prompt Sensitivity: Qwen3-1.7B is highly sensitive to system prompts, with incorrect prompts significantly affecting its performance, especially in reasoning tasks. This highlights the importance of precise prompt engineering for Qwen3.
- Reasoning & Long-Context Limitations: The current fine-tuning methods and dataset have impaired the model's ability to switch thinking modes (
/no_think or /think), degraded its complex reasoning capabilities, and severely compromised its long-context performance. - Comparative Performance: In evaluations using the original dataset, Qwen2.5 generally outperformed Qwen3, particularly when an incorrect system prompt was applied. While Qwen3 showed lower training loss, its evaluation performance was similar or worse than Qwen2.5.
- Real-World Usage: For complex or out-of-distribution questions, both the 1.5B and 1.7B models showed constrained responses with weak logical coherence, though Qwen2.5's responses were slightly more structured.
Current Status
Research on Qwen3 for this project is temporarily paused, with the focus shifting to improving dataset logic, creativity, long-context performance, and fine-tuning a Qwen2.5-7B model. This test provides valuable insights into the challenges of fine-tuning Qwen3, particularly regarding its unique reasoning mechanisms and prompt sensitivity.