Name: Pradheep1647/qwen2.5-0.5b-instruct-openai-gsm8k-dppo-topk API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Pradheep1647

Model Overview

This model, Pradheep1647/qwen2.5-0.5b-instruct-openai-gsm8k-dppo-topk, is a small-scale experimental fine-tune of the Qwen/Qwen2.5-0.5B-Instruct base model. It utilizes the DPPO-topk method for post-training on a subset of the openai/gsm8k dataset, which is designed for mathematical word problems.

Key Characteristics

Base Model: Qwen2.5-0.5B-Instruct, a 0.5 billion parameter model.
Fine-tuning Method: DPPO-topk (Direct Preference Optimization with top-k sampling).
Training Data: A small subset of 400 samples from the openai/gsm8k dataset, with 100 samples used for evaluation.
Prompt Format: Emphasizes step-by-step reasoning, with the final answer expected after ####.
Reward System: Rewards +1.0 for a correct final numeric answer and +0.1 for a parseable final answer, indicating a focus on structured output and accuracy in mathematical contexts.

Experimental Focus

This model represents a small, controlled experiment rather than a benchmark-setting release. Its primary purpose is to explore the effectiveness of DPPO-topk on a mathematical reasoning dataset for a compact model. While the reported eval_acc is 0.11, the experiment provides insights into the training process and reward shaping for such tasks. It is not intended for production use but as a demonstration of a specific fine-tuning approach for arithmetic problem-solving.

Overview

Model Overview

Key Characteristics

Experimental Focus

Full Model Card (README)