mlfoundations-dev/d1_math_multiple_languages

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 30, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The mlfoundations-dev/d1_math_multiple_languages model is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct. This model is specifically optimized for mathematical tasks across multiple languages, leveraging its base architecture's capabilities. It is designed to enhance performance in complex numerical and logical reasoning problems. The model's fine-tuning on a specialized dataset aims to improve its accuracy and utility in quantitative domains.

Loading preview...

Overview

The mlfoundations-dev/d1_math_multiple_languages model is a 7.6 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-7B-Instruct base model. This fine-tuning process utilized the mlfoundations-dev/d1_math_multiple_languages dataset, indicating a specialization in mathematical reasoning and multi-language support for such tasks.

Key Capabilities

  • Mathematical Reasoning: Optimized for handling mathematical problems and queries.
  • Multilingual Support: Designed to process mathematical content across various languages.
  • Instruction Following: Inherits strong instruction-following capabilities from its Qwen2.5-7B-Instruct base.

Training Details

The model was trained with a learning rate of 4e-05 over 5.0 epochs, using a total batch size of 128 across 32 devices. The optimizer used was ADAMW_TORCH with a cosine learning rate scheduler and a 0.1 warmup ratio. This configuration suggests a focus on robust and stable training for specialized tasks.

Intended Use Cases

This model is particularly suitable for applications requiring accurate mathematical problem-solving and understanding in diverse linguistic contexts. It can be beneficial for educational tools, scientific research, or any system needing to interpret and generate mathematical responses across different languages.