Name: Neira/Qwen2.5-0.5B_muon_v2_simple API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Neira

Model Overview

Neira/Qwen2.5-0.5B_muon_v2_simple is a compact language model, fine-tuned from the base Qwen/Qwen2.5-0.5B architecture. This model has 0.5 billion parameters and a context length of 32768 tokens, making it suitable for applications requiring a smaller footprint while maintaining a reasonable context window.

Training Details

The model was trained with specific hyperparameters, including a learning rate of 5e-05, a total batch size of 32 (achieved with a train batch size of 4 and gradient accumulation steps of 8), and 1 epoch. A notable aspect of its training is the use of the Muon optimizer and a cosine learning rate scheduler with 0.01 warmup steps. The training was conducted using Transformers 5.5.4, Pytorch 2.10.0+cu128, Datasets 4.8.3, and Tokenizers 0.22.2.

Intended Use Cases

While specific use cases and limitations are not detailed in the provided information, as a fine-tuned Qwen2.5-0.5B model, it is generally applicable for tasks such as text generation, summarization, and question answering where a smaller, efficient model is preferred. Its fine-tuning on an unspecified dataset suggests potential specialization, though further details are needed to ascertain its primary strengths.

Overview

Model Overview

Training Details

Intended Use Cases

Full Model Card (README)