Neira/Qwen2.5-0.5B_muon_v2_simple

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 26, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

Neira/Qwen2.5-0.5B_muon_v2_simple is a 0.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-0.5B architecture. This model utilizes the Muon optimizer during its training process, distinguishing it from other models. It is intended for general language tasks, though specific optimizations are not detailed.

Loading preview...

Model Overview

Neira/Qwen2.5-0.5B_muon_v2_simple is a compact language model, fine-tuned from the base Qwen/Qwen2.5-0.5B architecture. This model has 0.5 billion parameters and a context length of 32768 tokens, making it suitable for applications requiring a smaller footprint while maintaining a reasonable context window.

Training Details

The model was trained with specific hyperparameters, including a learning rate of 5e-05, a total batch size of 32 (achieved with a train batch size of 4 and gradient accumulation steps of 8), and 1 epoch. A notable aspect of its training is the use of the Muon optimizer and a cosine learning rate scheduler with 0.01 warmup steps. The training was conducted using Transformers 5.5.4, Pytorch 2.10.0+cu128, Datasets 4.8.3, and Tokenizers 0.22.2.

Intended Use Cases

While specific use cases and limitations are not detailed in the provided information, as a fine-tuned Qwen2.5-0.5B model, it is generally applicable for tasks such as text generation, summarization, and question answering where a smaller, efficient model is preferred. Its fine-tuning on an unspecified dataset suggests potential specialization, though further details are needed to ascertain its primary strengths.