namezz/lvm-math-0402-a-qwen2.5-7b-instruct-b-qwen2.5-1.5b-instruct
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 3, 2026License:otherArchitecture:Transformer Cold

The namezz/lvm-math-0402-a-qwen2.5-7b-instruct-b-qwen2.5-1.5b-instruct model is a 1.5 billion parameter instruction-tuned language model based on Qwen2.5-1.5B-Instruct. It has been fine-tuned on a mathematical dataset (7b_math_95k_16_train) and is optimized for mathematical reasoning tasks. This model demonstrates improved performance on various mathematical metrics, including reduced loss and error rates, making it suitable for applications requiring numerical accuracy.

Loading preview...

Model Overview

This model, namezz/lvm-math-0402-a-qwen2.5-7b-instruct-b-qwen2.5-1.5b-instruct, is a specialized version of the Qwen2.5-1.5B-Instruct architecture, featuring 1.5 billion parameters and a 32768-token context length. It has undergone fine-tuning specifically on the 7b_math_95k_16_train dataset, indicating a strong focus on mathematical reasoning and problem-solving capabilities.

Key Capabilities

  • Mathematical Proficiency: Fine-tuned on a dedicated math dataset, suggesting enhanced performance in numerical and mathematical tasks.
  • Optimized for Accuracy: Achieves a final loss of 0.0051 and a Token Mean Relative Error of 0.2861 on the evaluation set, indicating a focus on precision.
  • Qwen2.5 Base: Leverages the robust architecture of Qwen2.5-1.5B-Instruct, providing a solid foundation for instruction-following.

Training Details

The model was trained with a learning rate of 2e-05, a total batch size of 1024 (across 4 GPUs with gradient accumulation), and for 2 epochs. The training utilized the AdamW_TORCH_FUSED optimizer and a cosine learning rate scheduler with 50 warmup steps.

Good for

  • Applications requiring mathematical problem-solving.
  • Tasks where numerical accuracy is critical.
  • Use cases benefiting from a compact yet specialized instruction-tuned model.