Name: eiknarf/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-rapid_stocky_stork API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: eiknarf

Overview

This model, eiknarf/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-rapid_stocky_stork, is a 0.5 billion parameter instruction-tuned language model. It is a fine-tuned variant of the Gensyn/Qwen2.5-0.5B-Instruct base model, developed by Gensyn. The fine-tuning process utilized the TRL (Transformer Reinforcement Learning) framework.

Key Capabilities

Enhanced Mathematical Reasoning: This model was specifically trained using the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method. GRPO is known for pushing the limits of mathematical reasoning in language models, as introduced in the DeepSeekMath paper.
Instruction Following: As an instruction-tuned model, it is designed to follow user prompts and generate relevant responses.

Good For

Applications requiring improved logical and mathematical problem-solving capabilities.
Tasks where a smaller, efficient model with specialized reasoning enhancements is beneficial.
Experimentation with models fine-tuned using advanced reinforcement learning techniques like GRPO.

Training Details

The model's training procedure leveraged the GRPO method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The training was conducted using TRL version 0.15.2, with Transformers 4.51.3 and Pytorch 2.6.0.

Overview

Overview

Key Capabilities

Good For

Training Details

Full Model Card (README)