Qwen/Qwen1.5-1.8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.8BQuant:BF16Ctx Length:32kPublished:Jan 22, 2024License:tongyi-qianwen-researchArchitecture:Transformer0.1K Warm

Qwen1.5-1.8B is a 1.8 billion parameter, transformer-based decoder-only language model developed by Qwen, serving as a beta version of Qwen2. It is pretrained on a large dataset and supports a stable 32K context length. This model is designed for further fine-tuning and post-training applications rather than direct text generation.

Loading preview...

Qwen1.5-1.8B Model Overview

Qwen1.5-1.8B is part of the Qwen1.5 series, a beta release for Qwen2, developed by Qwen. This transformer-based decoder-only language model features 1.8 billion parameters and is pretrained on extensive data. It incorporates a stable 32K context length across all model sizes and an improved tokenizer designed for multiple natural languages and code.

Key Capabilities & Improvements

  • Multilingual Support: Both base and chat models offer enhanced multilingual capabilities.
  • Stable Context Length: Consistently supports a 32K token context window.
  • Architecture: Based on the Transformer architecture, utilizing SwiGLU activation and attention QKV bias.
  • Ease of Use: Does not require trust_remote_code for deployment.

Recommended Use Cases

This base language model is not advised for direct text generation. Instead, it is optimized for developers to apply further post-training techniques such as:

  • Supervised Fine-Tuning (SFT)
  • Reinforcement Learning from Human Feedback (RLHF)
  • Continued Pretraining

For more details, refer to the Qwen1.5 GitHub repository.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p