Pradheep1647/qwen2.5-0.5b-instruct-openai-gsm8k-dppo-topk

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:May 22, 2026Architecture:Transformer Warm

Pradheep1647/qwen2.5-0.5b-instruct-openai-gsm8k-dppo-topk is a 0.5 billion parameter Qwen2.5-Instruct model fine-tuned using DPPO-topk on a subset of the OpenAI GSM8K dataset. This model is an experimental run focused on improving mathematical reasoning for small models, specifically targeting arithmetic problem-solving. It aims to enhance the model's ability to produce correct numeric answers and parseable final answers in a step-by-step reasoning format.

Loading preview...

Model Overview

This model, Pradheep1647/qwen2.5-0.5b-instruct-openai-gsm8k-dppo-topk, is a small-scale experimental fine-tune of the Qwen/Qwen2.5-0.5B-Instruct base model. It utilizes the DPPO-topk method for post-training on a subset of the openai/gsm8k dataset, which is designed for mathematical word problems.

Key Characteristics

  • Base Model: Qwen2.5-0.5B-Instruct, a 0.5 billion parameter model.
  • Fine-tuning Method: DPPO-topk (Direct Preference Optimization with top-k sampling).
  • Training Data: A small subset of 400 samples from the openai/gsm8k dataset, with 100 samples used for evaluation.
  • Prompt Format: Emphasizes step-by-step reasoning, with the final answer expected after ####.
  • Reward System: Rewards +1.0 for a correct final numeric answer and +0.1 for a parseable final answer, indicating a focus on structured output and accuracy in mathematical contexts.

Experimental Focus

This model represents a small, controlled experiment rather than a benchmark-setting release. Its primary purpose is to explore the effectiveness of DPPO-topk on a mathematical reasoning dataset for a compact model. While the reported eval_acc is 0.11, the experiment provides insights into the training process and reward shaping for such tasks. It is not intended for production use but as a demonstration of a specific fine-tuning approach for arithmetic problem-solving.