hZzy/qwen2-0.5b-sft
The hZzy/qwen2-0.5b-sft model is a 0.5 billion parameter language model, fine-tuned from Qwen/Qwen2-0.5B by hZzy. It was trained on the HuggingFaceH4/ultrachat_200k dataset, achieving a validation loss of 1.5327. This instruction-tuned model is designed for general conversational AI tasks, leveraging its compact size for efficient deployment.
Loading preview...
Model Overview
The hZzy/qwen2-0.5b-sft is a compact 0.5 billion parameter language model, fine-tuned from the base Qwen/Qwen2-0.5B architecture. This instruction-tuned variant was developed by hZzy, utilizing the HuggingFaceH4/ultrachat_200k dataset for supervised fine-tuning (SFT).
Training Details
The model was trained for 1 epoch with a learning rate of 2e-05, using a total batch size of 192 across 3 devices with gradient accumulation steps of 8. The optimizer used was Adam with default betas and epsilon, and a cosine learning rate scheduler with a warmup ratio of 0.1. Mixed-precision training (Native AMP) was employed. During training, the model achieved a validation loss of 1.5327.
Potential Use Cases
Given its instruction-tuned nature and compact size, this model is suitable for:
- Lightweight conversational agents: Deploying in environments with limited computational resources.
- Quick prototyping: Rapidly testing and iterating on language-based applications.
- Educational purposes: Understanding the principles of instruction tuning on smaller models.