Qwen/Qwen1.5-1.8B
Qwen1.5-1.8B is a 1.8 billion parameter, transformer-based decoder-only language model developed by Qwen, serving as a beta version of Qwen2. It is pretrained on a large dataset and supports a stable 32K context length. This model is designed for further fine-tuning and post-training applications rather than direct text generation.
Loading preview...
Qwen1.5-1.8B Model Overview
Qwen1.5-1.8B is part of the Qwen1.5 series, a beta release for Qwen2, developed by Qwen. This transformer-based decoder-only language model features 1.8 billion parameters and is pretrained on extensive data. It incorporates a stable 32K context length across all model sizes and an improved tokenizer designed for multiple natural languages and code.
Key Capabilities & Improvements
- Multilingual Support: Both base and chat models offer enhanced multilingual capabilities.
- Stable Context Length: Consistently supports a 32K token context window.
- Architecture: Based on the Transformer architecture, utilizing SwiGLU activation and attention QKV bias.
- Ease of Use: Does not require
trust_remote_codefor deployment.
Recommended Use Cases
This base language model is not advised for direct text generation. Instead, it is optimized for developers to apply further post-training techniques such as:
- Supervised Fine-Tuning (SFT)
- Reinforcement Learning from Human Feedback (RLHF)
- Continued Pretraining
For more details, refer to the Qwen1.5 GitHub repository.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.