Qwen/Qwen1.5-1.8B

Warm
Public
1.8B
BF16
32768
1
Jan 22, 2024
License: tongyi-qianwen-research
Hugging Face
Overview

Qwen1.5-1.8B Model Overview

Qwen1.5-1.8B is part of the Qwen1.5 series, a beta release for Qwen2, developed by Qwen. This transformer-based decoder-only language model features 1.8 billion parameters and is pretrained on extensive data. It incorporates a stable 32K context length across all model sizes and an improved tokenizer designed for multiple natural languages and code.

Key Capabilities & Improvements

  • Multilingual Support: Both base and chat models offer enhanced multilingual capabilities.
  • Stable Context Length: Consistently supports a 32K token context window.
  • Architecture: Based on the Transformer architecture, utilizing SwiGLU activation and attention QKV bias.
  • Ease of Use: Does not require trust_remote_code for deployment.

Recommended Use Cases

This base language model is not advised for direct text generation. Instead, it is optimized for developers to apply further post-training techniques such as:

  • Supervised Fine-Tuning (SFT)
  • Reinforcement Learning from Human Feedback (RLHF)
  • Continued Pretraining

For more details, refer to the Qwen1.5 GitHub repository.