The chaejin98330/Qwen2.5-0.5B-Finetuned model is a 0.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-0.5B-Instruct base model. It was trained with a learning rate of 2e-05 over 5 epochs, utilizing AdamW_TORCH_FUSED optimizer and a linear learning rate scheduler. With a context length of 131072 tokens, this model is a compact, instruction-tuned variant of the Qwen2.5 architecture, suitable for tasks requiring a smaller footprint.
Loading preview...
Model Overview
This model, chaejin98330/Qwen2.5-0.5B-Finetuned, is a specialized version derived from the Qwen/Qwen2.5-0.5B-Instruct base model. It features 0.5 billion parameters and supports an extensive context length of 131072 tokens, making it capable of processing very long sequences of text.
Training Details
The fine-tuning process involved specific hyperparameters:
- Base Model: Qwen/Qwen2.5-0.5B-Instruct
- Learning Rate: 2e-05
- Optimizer: AdamW_TORCH_FUSED with betas=(0.9, 0.999) and epsilon=1e-08
- Batch Size: 8 (train and eval), with a total effective batch size of 16 due to gradient accumulation
- Epochs: 5
- Scheduler: Linear learning rate scheduler
Key Characteristics
While the specific dataset used for fine-tuning is not detailed, the model's origin as an instruction-tuned variant suggests its primary utility lies in following instructions and generating coherent responses. Its compact size (0.5B parameters) combined with a very large context window makes it potentially efficient for applications where memory and computational resources are constrained but long-range understanding is required.
Intended Uses
Given its instruction-tuned nature and small parameter count, this model is likely suitable for:
- Lightweight instruction-following tasks
- Applications requiring long context processing on resource-limited devices
- Further experimentation and fine-tuning on specific, niche datasets where the base Qwen2.5-0.5B-Instruct model's capabilities are a good starting point.