Qwen/Qwen1.5-32B

Cold
Public
32.5B
FP8
32768
License: tongyi-qianwen-research
Hugging Face
Overview

Qwen1.5-32B Overview

Qwen1.5-32B is a 32.5 billion parameter model within the Qwen1.5 series, which represents the beta version of Qwen2. This transformer-based, decoder-only language model is pretrained on extensive data and offers several key advancements over its predecessors. It is part of a family of eight models ranging from 0.5B to 72B parameters, including an MoE variant.

Key Capabilities & Improvements

  • Enhanced Performance: Significant improvements in chat model performance compared to previous Qwen iterations.
  • Multilingual Support: Both base and chat models offer robust multilingual capabilities.
  • Extended Context Length: Provides stable support for a 32K context length across all model sizes.
  • Simplified Usage: Eliminates the need for trust_remote_code, streamlining integration.
  • Architectural Foundation: Built on the Transformer architecture, incorporating features like SwiGLU activation, attention QKV bias, and group query attention (specifically for the 32B model).

Usage Recommendations

Qwen1.5-32B is primarily intended as a base model for further development. Users are advised against directly using the base language model for text generation. Instead, it is recommended for post-training applications such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining to adapt it to specific use cases. For optimal performance, transformers>=4.37.0 is required.