Name: Weymouth/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-downy_dense_starfish API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Weymouth

Model Overview

This model, Weymouth/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-downy_dense_starfish, is a compact 0.5 billion parameter instruction-tuned language model. It is a fine-tuned variant of the unsloth/Qwen2.5-0.5B-Instruct base model, developed to provide efficient instruction-following capabilities.

Key Training Details

Fine-tuning Framework: The model was trained using the TRL (Transformer Reinforcement Learning) library, a popular framework for fine-tuning large language models.
GRPO Method: A notable aspect of its training is the application of GRPO (Gradient-based Reward Policy Optimization), a method first introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). While GRPO was initially used for mathematical reasoning, its application here suggests an emphasis on robust and efficient learning from instructions.
Context Length: The model supports a substantial context length of 131072 tokens, allowing it to process and generate responses based on extensive input.

Intended Use

This model is suitable for various instruction-following tasks where a smaller, efficient model with a large context window is beneficial. Its training methodology, including GRPO, implies a focus on reliable and structured response generation, making it a candidate for applications requiring consistent output from instructions.

Overview

Model Overview

Key Training Details

Intended Use

Full Model Card (README)