modaserMoj/csc415-phase1-0.5b-fast

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Mar 8, 2026Architecture:Transformer Warm

modaserMoj/csc415-phase1-0.5b-fast is a 0.5 billion parameter language model fine-tuned from Qwen/Qwen2.5-0.5B, featuring a 32768 token context length. This model was trained using the TRL framework and the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring robust mathematical problem-solving, making it suitable for applications in scientific computing and quantitative analysis.

Loading preview...

Overview

modaserMoj/csc415-phase1-0.5b-fast is a 0.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-0.5B base model. It leverages a substantial context length of 32768 tokens, allowing it to process extensive inputs and maintain coherence over long conversations or documents. The model's training utilized the TRL framework.

Key Capabilities

  • Enhanced Mathematical Reasoning: This model was specifically trained using the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper. This training approach aims to significantly improve its performance on mathematical reasoning tasks.
  • Long Context Understanding: With a 32768 token context window, the model can handle complex queries and generate detailed responses that require understanding of large amounts of information.

Training Details

The model's fine-tuning process incorporated GRPO, a technique designed to push the limits of mathematical reasoning in open language models. The training environment included TRL 0.29.0, Transformers 5.3.0, Pytorch 2.10.0+cu128, Datasets 4.6.1, and Tokenizers 0.22.2.

Good For

  • Applications requiring strong mathematical problem-solving.
  • Tasks benefiting from processing and generating long sequences of text.
  • Research and development in mathematical AI.