tengfeima-ai/Qwen2.5-0.5B-Math-SFT-1024
tengfeima-ai/Qwen2.5-0.5B-Math-SFT-1024 is a 0.5 billion parameter language model fine-tuned from Qwen/Qwen2.5-0.5B-Instruct. This model is specifically optimized for mathematical tasks, having been fine-tuned on the deepmath_sft_1024 dataset. It is designed to improve performance in mathematical reasoning and problem-solving within a 32768 token context length.
Loading preview...
Overview
This model, tengfeima-ai/Qwen2.5-0.5B-Math-SFT-1024, is a specialized language model with 0.5 billion parameters. It is a fine-tuned variant of the Qwen/Qwen2.5-0.5B-Instruct base model, specifically trained on the deepmath_sft_1024 dataset.
Key Capabilities
- Mathematical Task Optimization: The primary focus of this model is to enhance performance on mathematical reasoning and problem-solving tasks due to its specialized fine-tuning dataset.
- Context Length: It supports a substantial context window of 32768 tokens, allowing for processing longer mathematical problems or related textual information.
- Performance: During training, the model achieved a validation loss of 0.6684, indicating its learning efficacy on the mathematical dataset.
Training Details
The model was trained using a learning rate of 2e-05, a batch size of 4 (with 8 gradient accumulation steps for an effective total batch size of 64), and a cosine learning rate scheduler over 3 epochs. The training utilized Transformers 4.57.1 and Pytorch 2.4.1+cu124.
Good for
- Applications requiring a compact model for mathematical problem-solving.
- Research and development in mathematical reasoning with LLMs.
- Scenarios where a specialized, smaller model is preferred over larger, general-purpose alternatives for math-centric tasks.