Overview
This model, bknyaz/Qwen3-0.6B-Fr, is a fine-tuned version of the Qwen/Qwen3-0.6B base model, featuring 0.8 billion parameters and a 32768-token context length. It was developed by bknyaz through fine-tuning on a curated dataset, kurakurai/luth-sft, which includes subsets like luth_smoltalk2, luth_aya_dataset, luth_croissantllm, and luth_tulu3_persona_instruct. The fine-tuning process utilized the TRL library with SFT/full-rank options, as detailed in the associated blog post on meta-merge experiments.
Key Capabilities and Performance
The fine-tuning significantly enhanced the model's performance, particularly in French language tasks. Evaluation results show notable improvements over the base Qwen3-0.6B model:
- gsm8k: Improved from 21.0 to 36.1
- french_bench: Slightly improved from 24.4 to 26.5
- gsm8k-fr: Improved from 19.6 to 26.5
- Average Score: Increased from 21.7 to 29.7
These results indicate a stronger capability in mathematical reasoning and general language understanding, especially for French-specific tasks. The model was evaluated using lm_eval on datasets such as gsm8k, french_bench, and gsm8k-fr.
Good for
- French Language Applications: Excels in tasks requiring understanding and generation in French.
- Conversational AI: Suitable for chatbots and interactive agents, given its training on conversational datasets.
- Instruction Following: Designed to respond effectively to instructions, making it useful for various NLP tasks.
- Research and Baselines: Provides a strong baseline for further fine-tuning or experimental work, particularly in multilingual or low-resource settings.