Neira/Qwen2.5-0.5B_adamw_v2

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 24, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Neira/Qwen2.5-0.5B_adamw_v2 is a 0.5 billion parameter causal language model, fine-tuned from the Qwen2.5-0.5B architecture. This compact model was trained with specific hyperparameters including a learning rate of 5e-05 and an AdamW optimizer, suggesting an optimization for efficient performance. While the specific fine-tuning dataset and primary use case are not detailed, its small size and training configuration indicate potential for resource-constrained applications or specialized tasks.

Loading preview...

Model Overview

Neira/Qwen2.5-0.5B_adamw_v2 is a compact 0.5 billion parameter language model, derived from the Qwen/Qwen2.5-0.5B base architecture. This model has undergone a fine-tuning process, although the specific dataset used for this fine-tuning is not publicly detailed. It maintains a substantial context length of 32768 tokens, which is notable for a model of its size.

Training Details

The fine-tuning procedure for this model utilized specific hyperparameters aimed at optimizing its performance. Key training parameters include:

  • Learning Rate: 5e-05
  • Optimizer: AdamW_TORCH_FUSED with betas=(0.9, 0.999) and epsilon=1e-08
  • Batch Size: A train batch size of 4, with a total effective batch size of 32 due to gradient accumulation steps of 8.
  • Epochs: Trained for 1.0 epoch.
  • Scheduler: Cosine learning rate scheduler with 0.01 warmup steps.

Potential Use Cases

Given its small parameter count and efficient training configuration, this model is likely suitable for:

  • Edge device deployment: Its compact size makes it ideal for environments with limited computational resources.
  • Specialized tasks: Could be effective for niche applications after further domain-specific fine-tuning.
  • Rapid prototyping: Its efficiency allows for quick experimentation and iteration in development workflows.