Overview
modaserMoj/csc415-phase1-0.5b-fast is a 0.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-0.5B base model. It leverages a substantial context length of 32768 tokens, allowing it to process extensive inputs and maintain coherence over long conversations or documents. The model's training utilized the TRL framework.
Key Capabilities
- Enhanced Mathematical Reasoning: This model was specifically trained using the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper. This training approach aims to significantly improve its performance on mathematical reasoning tasks.
- Long Context Understanding: With a 32768 token context window, the model can handle complex queries and generate detailed responses that require understanding of large amounts of information.
Training Details
The model's fine-tuning process incorporated GRPO, a technique designed to push the limits of mathematical reasoning in open language models. The training environment included TRL 0.29.0, Transformers 5.3.0, Pytorch 2.10.0+cu128, Datasets 4.6.1, and Tokenizers 0.22.2.
Good For
- Applications requiring strong mathematical problem-solving.
- Tasks benefiting from processing and generating long sequences of text.
- Research and development in mathematical AI.