AIR-hl/Qwen2.5-1.5B-ultrachat200k

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Nov 17, 2024License:apache-2.0Architecture:Transformer Open Weights Warm

AIR-hl/Qwen2.5-1.5B-ultrachat200k is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-1.5B. It was trained on the HuggingFaceH4/ultrachat_200k dataset, leveraging flash_attention_2 for efficient processing. This model is optimized for conversational AI tasks, demonstrating improved performance in chat-based interactions.

Loading preview...

Overview

AIR-hl/Qwen2.5-1.5B-ultrachat200k is a 1.5 billion parameter instruction-tuned model, building upon the Qwen/Qwen2.5-1.5B base model. It is licensed under Apache 2.0 and was fine-tuned using the trl framework.

Key Capabilities

  • Instruction Following: Enhanced ability to follow instructions due to fine-tuning on the ultrachat_200k dataset.
  • Efficient Training: Utilizes flash_attention_2 for optimized attention mechanisms during training, contributing to faster processing.
  • Conversational AI: Specifically trained on a large-scale chat dataset, making it suitable for dialogue-oriented applications.
  • Quantization Support: Designed to work with quantization configurations, allowing for potential deployment on resource-constrained environments.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-5 and a max_seq_length of 2048. Key training hyperparameters included bf16 precision and a warmup_ratio of 0.1. The training process resulted in a final training loss of 1.192 and an evaluation loss of 1.2003.

Good For

  • Developing chatbots and virtual assistants.
  • Applications requiring robust instruction-following in a conversational context.
  • Research and experimentation with smaller, efficient instruction-tuned models.