Model Overview
The anujjamwal/hcot-qwen2.5-math-1.5b is a 1.5 billion parameter language model, fine-tuned by anujjamwal. It is built upon the robust Qwen/Qwen2.5-Math-1.5B architecture, which is inherently designed for mathematical tasks. This specific iteration has undergone further fine-tuning, aiming to enhance its performance in mathematical reasoning and problem-solving.
Key Characteristics
- Base Model: Qwen2.5-Math-1.5B, known for its mathematical capabilities.
- Parameter Count: 1.5 billion parameters, offering a balance between performance and computational efficiency.
- Context Length: Supports a substantial context window of 32768 tokens, enabling it to process and understand longer mathematical problems or sequences.
- Fine-tuning Focus: Optimized for mathematical reasoning, suggesting improved accuracy and understanding in this domain.
Training Details
The model was trained with specific hyperparameters, including a learning rate of 2e-05, a total batch size of 8 (train_batch_size: 2, gradient_accumulation_steps: 4), and 10 epochs. It utilized the AdamW_Torch_Fused optimizer and a linear learning rate scheduler with 0.1 warmup steps. The training was conducted using Transformers 5.0.0, Pytorch 2.10.0+cu128, Datasets 4.0.0, and Tokenizers 0.22.2.
Intended Use Cases
This model is particularly well-suited for applications requiring strong mathematical understanding and problem-solving. While specific use cases are not detailed, its mathematical specialization makes it ideal for tasks such as:
- Solving mathematical equations and problems.
- Assisting in scientific computations.
- Generating mathematical explanations or proofs.
- Educational tools for mathematics.