Name: Degandance/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-freckled_waddling_viper API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Degandance

Model Overview

Degandance/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-freckled_waddling_viper is a 0.5 billion parameter instruction-tuned language model, building upon the unsloth/Qwen2.5-0.5B-Instruct base. It was fine-tuned using the TRL library, specifically incorporating the GRPO (Gradient-based Reward Policy Optimization) method.

Key Capabilities

Enhanced Mathematical Reasoning: The model's training with GRPO, a method introduced in the "DeepSeekMath" paper, suggests a focus on improving mathematical problem-solving and reasoning abilities.
Instruction Following: As an instruction-tuned model, it is designed to accurately interpret and execute user prompts and instructions.
Large Context Window: It supports a context length of 131072 tokens, allowing it to process and generate responses based on extensive input information.

Training Details

The model was trained using TRL version 0.18.2, with Transformers 4.52.4 and PyTorch 2.7.1. The GRPO method, detailed in the paper DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, was central to its training procedure.

Good For

Applications requiring strong mathematical reasoning.
Tasks benefiting from a large context window for understanding complex instructions or long documents.
Instruction-following tasks where precise adherence to prompts is crucial.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)