cs-552-2026-MandMP/multilingual_model
The cs-552-2026-MandMP/multilingual_model is a Qwen3-1.7B base model fine-tuned by cs-552-2026-MandMP using LoRA with r=16. This model specializes in multilingual understanding, specifically fine-tuned on the Global-MMLU dataset covering Italian, Spanish, Chinese, Russian, and Hindi. It is designed to provide answers preceded by a short English chain-of-thought, making it suitable for cross-lingual reasoning tasks.
Loading preview...
Multilingual Model (MandMP) v3 Overview
The cs-552-2026-MandMP/multilingual_model is a specialized language model built upon the Qwen3-1.7B architecture. Developed by cs-552-2026-MandMP, this version (v3) has been fine-tuned using the LoRA (Low-Rank Adaptation) method with a rank of 16, enhancing its capabilities without significantly increasing the model's size.
Key Capabilities
- Multilingual Proficiency: The model's primary strength lies in its multilingual understanding, having been fine-tuned on the Global-MMLU dataset. This dataset includes content in Italian (it), Spanish (es), Chinese (zh), Russian (ru), and Hindi (hi), enabling the model to process and respond in these languages.
- Enhanced Reasoning: A unique feature of this model is its integration of a short English chain-of-thought before providing the final boxed answer. This approach aims to improve the model's reasoning process and transparency, particularly in complex multilingual queries.
- Efficient Fine-tuning: The use of LoRA allows for efficient adaptation of the base Qwen3-1.7B model, making it a practical choice for applications requiring strong multilingual performance with a relatively compact footprint.
Good For
- Applications requiring multilingual question answering or text generation across Italian, Spanish, Chinese, Russian, and Hindi.
- Use cases where reasoning transparency is beneficial, thanks to the English chain-of-thought output.
- Developers looking for a resource-efficient multilingual model based on the Qwen3 architecture.