Qwen/Qwen1.5-4B

Warm
Public
4B
BF16
32768
License: tongyi-qianwen-research
Hugging Face
Overview

Qwen1.5-4B: A Beta Release of Qwen2

Qwen1.5-4B is a 4 billion parameter model within the Qwen1.5 series, representing a significant update to the original Qwen architecture. Developed by Qwen, this transformer-based, decoder-only language model is pretrained on extensive data and offers several key improvements over its predecessor.

Key Capabilities & Features

  • Multilingual Support: Both base and chat models are designed with enhanced multilingual capabilities.
  • Extended Context Length: Provides stable support for a 32K token context window across all model sizes.
  • Improved Tokenizer: Features an adaptive tokenizer optimized for multiple natural languages and programming codes.
  • Simplified Usage: No longer requires trust_remote_code, streamlining integration.
  • Architectural Enhancements: Incorporates SwiGLU activation, attention QKV bias, and group query attention, though GQA and mixed SWA/full attention are temporarily excluded in this beta version.

Recommended Use Cases

This base model is primarily intended for developers and researchers who plan to perform further post-training. It serves as an excellent foundation for:

  • Supervised Fine-Tuning (SFT): Adapting the model to specific tasks or datasets.
  • Reinforcement Learning from Human Feedback (RLHF): Aligning the model's behavior with human preferences.
  • Continued Pretraining: Further training on specialized datasets to enhance domain-specific knowledge.