Overview
Qwen1.5-1.8B Model Overview
Qwen1.5-1.8B is part of the Qwen1.5 series, a beta release for Qwen2, developed by Qwen. This transformer-based decoder-only language model features 1.8 billion parameters and is pretrained on extensive data. It incorporates a stable 32K context length across all model sizes and an improved tokenizer designed for multiple natural languages and code.
Key Capabilities & Improvements
- Multilingual Support: Both base and chat models offer enhanced multilingual capabilities.
- Stable Context Length: Consistently supports a 32K token context window.
- Architecture: Based on the Transformer architecture, utilizing SwiGLU activation and attention QKV bias.
- Ease of Use: Does not require
trust_remote_codefor deployment.
Recommended Use Cases
This base language model is not advised for direct text generation. Instead, it is optimized for developers to apply further post-training techniques such as:
- Supervised Fine-Tuning (SFT)
- Reinforcement Learning from Human Feedback (RLHF)
- Continued Pretraining
For more details, refer to the Qwen1.5 GitHub repository.