cs-552-2026-MandMP/multilingual_model

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:May 14, 2026Architecture:Transformer Warm

The cs-552-2026-MandMP/multilingual_model is a Qwen3-1.7B base model fine-tuned by cs-552-2026-MandMP using LoRA with r=16. This model specializes in multilingual understanding, specifically fine-tuned on the Global-MMLU dataset covering Italian, Spanish, Chinese, Russian, and Hindi. It is designed to provide answers preceded by a short English chain-of-thought, making it suitable for cross-lingual reasoning tasks.

Loading preview...

Multilingual Model (MandMP) v3 Overview

The cs-552-2026-MandMP/multilingual_model is a specialized language model built upon the Qwen3-1.7B architecture. Developed by cs-552-2026-MandMP, this version (v3) has been fine-tuned using the LoRA (Low-Rank Adaptation) method with a rank of 16, enhancing its capabilities without significantly increasing the model's size.

Key Capabilities

  • Multilingual Proficiency: The model's primary strength lies in its multilingual understanding, having been fine-tuned on the Global-MMLU dataset. This dataset includes content in Italian (it), Spanish (es), Chinese (zh), Russian (ru), and Hindi (hi), enabling the model to process and respond in these languages.
  • Enhanced Reasoning: A unique feature of this model is its integration of a short English chain-of-thought before providing the final boxed answer. This approach aims to improve the model's reasoning process and transparency, particularly in complex multilingual queries.
  • Efficient Fine-tuning: The use of LoRA allows for efficient adaptation of the base Qwen3-1.7B model, making it a practical choice for applications requiring strong multilingual performance with a relatively compact footprint.

Good For

  • Applications requiring multilingual question answering or text generation across Italian, Spanish, Chinese, Russian, and Hindi.
  • Use cases where reasoning transparency is beneficial, thanks to the English chain-of-thought output.
  • Developers looking for a resource-efficient multilingual model based on the Qwen3 architecture.