Name: vigilantETH/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-mangy_knobby_tuna API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: vigilantETH

Overview

vigilantETH/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-mangy_knobby_tuna is a 0.5 billion parameter instruction-tuned model, building upon the Gensyn/Qwen2.5-0.5B-Instruct base. It leverages the TRL (Transformer Reinforcement Learning) framework for its fine-tuning process.

Key Capabilities

Enhanced Mathematical Reasoning: This model was trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, specifically to improve its mathematical reasoning abilities.
Instruction Following: As an instruction-tuned model, it is designed to understand and execute user prompts effectively.
Large Context Window: Features a significant context length of 131072 tokens, allowing it to process and generate longer, more coherent texts while maintaining contextual awareness.

Training Details

The model's training incorporated GRPO, a technique detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" arXiv:2402.03300. The fine-tuning was performed using TRL version 0.15.2, with Transformers 4.51.3 and Pytorch 2.6.0.

Good For

Applications requiring strong mathematical reasoning.
Tasks benefiting from a large context window for processing extensive inputs or generating detailed outputs.
Instruction-following scenarios where a compact yet capable model is desired.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)