xw1234gan/Extended_Merging_Qwen2.5-3B-Instruct_MATH_lr1e-05_mb2_ga128_n2048_seed42

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Mar 29, 2026Architecture:Transformer Cold

The xw1234gan/Extended_Merging_Qwen2.5-3B-Instruct_MATH_lr1e-05_mb2_ga128_n2048_seed42 is a 3.1 billion parameter instruction-tuned language model based on the Qwen2.5 architecture. This model is specifically designed and fine-tuned for mathematical reasoning and problem-solving tasks. It leverages an extended merging technique to enhance its capabilities in numerical and logical operations. Its primary strength lies in handling complex mathematical queries and generating accurate solutions.

Loading preview...

Model Overview

This model, xw1234gan/Extended_Merging_Qwen2.5-3B-Instruct_MATH_lr1e-05_mb2_ga128_n2048_seed42, is a 3.1 billion parameter instruction-tuned language model built upon the Qwen2.5 architecture. It has been developed with a specific focus on enhancing its performance in mathematical domains through an extended merging technique during its training process.

Key Characteristics

  • Base Architecture: Qwen2.5-3B-Instruct, providing a robust foundation for instruction following.
  • Parameter Count: 3.1 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a context length of 32768 tokens, allowing for processing of longer mathematical problems or related instructions.
  • Specialized Training: Fine-tuned with a learning rate of 1e-05, a micro-batch size of 2, and gradient accumulation of 128, over 2048 steps, with a seed of 42, indicating a focused training regimen.

Intended Use Cases

This model is particularly well-suited for applications requiring strong mathematical reasoning and problem-solving capabilities. While specific benchmarks are not provided in the model card, its naming convention and training parameters suggest an optimization for:

  • Solving mathematical equations and word problems.
  • Assisting in educational tools for math students.
  • Generating explanations for mathematical concepts.
  • Applications where numerical accuracy and logical deduction are paramount.