Name: Mahdikp/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-chattering_whistling_kingfisher API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Mahdikp

Model Overview

This model, Mahdikp/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-chattering_whistling_kingfisher, is a fine-tuned variant of the unsloth/Qwen2.5-0.5B-Instruct base model. It features 0.5 billion parameters and supports a substantial context length of 32768 tokens, making it suitable for processing longer inputs and generating detailed responses.

Key Training Details

Fine-tuning Method: The model was trained using GRPO (Gradient Regularized Policy Optimization), a method specifically highlighted for its effectiveness in improving mathematical reasoning in language models. This technique was originally presented in the DeepSeekMath paper.
Frameworks: Training was conducted using TRL (Transformer Reinforcement Learning) version 0.18.1, alongside Transformers 4.52.4 and PyTorch 2.7.1.

Potential Use Cases

Instruction Following: As an instruction-tuned model, it is designed to accurately follow user prompts and generate relevant outputs.
Mathematical Reasoning: The application of the GRPO training method suggests enhanced capabilities in tasks that involve mathematical problem-solving and logical deduction, particularly for a model of its size.

Overview

Model Overview

Key Training Details

Potential Use Cases

Full Model Card (README)