Neira/Qwen2.5-0.5B_muon_v2
Neira/Qwen2.5-0.5B_muon_v2 is a 0.5 billion parameter language model, fine-tuned from the Qwen2.5-0.5B architecture. This model was trained using the Muon optimizer with a learning rate of 5e-05 and a context length of 32768 tokens. Specific details regarding its fine-tuning dataset, intended uses, and primary differentiators are not provided in the available documentation.
Loading preview...
Model Overview
Neira/Qwen2.5-0.5B_muon_v2 is a fine-tuned variant of the Qwen/Qwen2.5-0.5B base model, featuring 0.5 billion parameters and supporting a substantial context length of 32768 tokens. This model was trained with specific hyperparameters, including a learning rate of 5e-05 and the Muon optimizer, over 1.0 epoch.
Training Details
The training process utilized a train_batch_size of 4, an eval_batch_size of 16, and a gradient_accumulation_steps of 8, resulting in a total_train_batch_size of 32. A cosine learning rate scheduler with 0.01 warmup steps was employed. The training environment included Transformers 5.5.4, Pytorch 2.10.0+cu128, Datasets 4.8.3, and Tokenizers 0.22.2.
Limitations and Further Information
The available documentation does not specify the dataset used for fine-tuning, nor does it detail the model's intended uses, limitations, or specific performance characteristics. Users should be aware that without this information, the model's suitability for particular applications remains undefined.