Name: nimishbongale/qwen-2.5-0.5b-grpo-rlcot-gsm8k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nimishbongale

nimishbongale/qwen-2.5-0.5b-grpo-rlcot-gsm8k Overview

This model is a compact 0.5 billion parameter variant of the Qwen 2.5 architecture, developed by nimishbongale. It has been specifically fine-tuned using a combination of Guided Reinforcement Learning with Policy Optimization (GRPO) and Reinforcement Learning from CoT (RLCoT) methods. The primary objective of this training regimen is to enhance its capabilities in mathematical reasoning, particularly on the GSM8K dataset.

Key Capabilities

Mathematical Reasoning: Optimized for solving arithmetic and word problems, as evidenced by its training on GSM8K.
Efficient Size: At 0.5 billion parameters, it offers a smaller footprint compared to larger models while still demonstrating specialized reasoning abilities.
Context Length: Supports a substantial context length of 131,072 tokens, beneficial for complex multi-step problems.
Training Potential: Shows promising scaling potential, with current training indicating further performance gains with extended epochs.

Good For

Educational Applications: Ideal for tasks requiring mathematical problem-solving in educational technology.
Research in RLCoT/GRPO: A valuable base model for researchers exploring advanced reinforcement learning techniques for reasoning tasks.
Resource-Constrained Environments: Its smaller size makes it suitable for deployment where computational resources are limited, but mathematical reasoning is required.

Overview

nimishbongale/qwen-2.5-0.5b-grpo-rlcot-gsm8k Overview

Key Capabilities

Good For

Full Model Card (README)