Qwen/Qwen1.5-72B

Cold
Public
72.3B
FP8
32768
License: tongyi-qianwen
Hugging Face
Overview

Qwen1.5-72B: A Powerful Multilingual Foundation Model

Qwen1.5-72B is a 72.3 billion parameter model from the Qwen1.5 series, developed by Qwen. This series represents the beta version of Qwen2, built upon a transformer-based decoder-only architecture. It introduces significant improvements over previous Qwen models, including enhanced performance in chat applications and robust multilingual support for both base and chat variants.

Key Capabilities & Features

  • Extensive Model Sizes: Part of a series offering 8 different dense model sizes, including 0.5B, 1.8B, 4B, 7B, 14B, 32B, and 72B, alongside a 14B MoE model.
  • Stable Long Context: Provides stable support for a 32K token context length across all model sizes, facilitating processing of longer inputs.
  • Multilingual Adaptability: Features an improved tokenizer designed for effective processing of multiple natural languages and code.
  • Architectural Enhancements: Incorporates a Transformer architecture with SwiGLU activation and attention QKV bias, though specific features like Group Query Attention (GQA) and mixed sliding window attention are temporarily excluded in this beta version (except for 32B).
  • Simplified Usage: Eliminates the need for trust_remote_code, streamlining integration.

Recommended Use Cases

Qwen1.5-72B is primarily intended as a base language model for further development. It is not advised for direct text generation without additional post-training. Developers should consider applying techniques such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining to adapt this model for specific downstream applications and achieve optimal performance.