WiroAI/OpenR1-Qwen-7B-Italian

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Mar 3, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

WiroAI/OpenR1-Qwen-7B-Italian is a 7.6 billion parameter language model developed by WiroAI, fine-tuned from Qwen2.5-Instruct with a 32768 token context length. This model is specifically optimized for Italian language reasoning, aiming to improve performance in a relatively low-resource language. It excels at step-by-step reasoning in Italian, making it suitable for tasks requiring detailed thought processes in this language.

Loading preview...

OpenR1-Qwen-7B-Italian Overview

This model, developed by WiroAI, is a 7.6 billion parameter language model fine-tuned from the Qwen2.5-Instruct architecture. Its primary focus is to enhance reasoning capabilities specifically for the Italian language, addressing the need for improved open-source models in relatively low-resource languages. The model was trained for 2 epochs on the WiroAI/dolphin-r1-Italian dataset, utilizing a learning rate of 1e-5 and a maximum sequence length of 4096 tokens, with training taking 5 days on an 8xA6000 ADA cluster.

Key Capabilities and Differentiators

  • Enhanced Italian Reasoning: The model demonstrates improved step-by-step reasoning processes in Italian compared to other models, which sometimes default to English or Chinese.
  • Specialized Fine-tuning: It is specifically fine-tuned on an Italian dataset to address language-specific nuances and improve cultural relevance.
  • Experimental Focus: Developed with experimental motives, it encourages community evaluation and contributions to democratize and culturally improve open-source models.

Usage Considerations

  • Token Generation: This model is designed to produce more tokens during inference, which can lead to better reasoning but also consumes more VRAM.
  • Evaluation Requirements: For accurate evaluation, it is crucial to allow the model to generate sufficient tokens, as restricting output to less than 4000 tokens may lead to suboptimal results.

This model is a valuable contribution for developers and researchers focusing on Italian natural language processing and reasoning tasks.