hZzy/qwen2-0.5b-sft

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Sep 10, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The hZzy/qwen2-0.5b-sft model is a 0.5 billion parameter language model, fine-tuned from Qwen/Qwen2-0.5B by hZzy. It was trained on the HuggingFaceH4/ultrachat_200k dataset, achieving a validation loss of 1.5327. This instruction-tuned model is designed for general conversational AI tasks, leveraging its compact size for efficient deployment.

Loading preview...

Model Overview

The hZzy/qwen2-0.5b-sft is a compact 0.5 billion parameter language model, fine-tuned from the base Qwen/Qwen2-0.5B architecture. This instruction-tuned variant was developed by hZzy, utilizing the HuggingFaceH4/ultrachat_200k dataset for supervised fine-tuning (SFT).

Training Details

The model was trained for 1 epoch with a learning rate of 2e-05, using a total batch size of 192 across 3 devices with gradient accumulation steps of 8. The optimizer used was Adam with default betas and epsilon, and a cosine learning rate scheduler with a warmup ratio of 0.1. Mixed-precision training (Native AMP) was employed. During training, the model achieved a validation loss of 1.5327.

Potential Use Cases

Given its instruction-tuned nature and compact size, this model is suitable for:

  • Lightweight conversational agents: Deploying in environments with limited computational resources.
  • Quick prototyping: Rapidly testing and iterating on language-based applications.
  • Educational purposes: Understanding the principles of instruction tuning on smaller models.