tangger/Qwen-7B-Chat

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:32kPublished:Sep 13, 2023Architecture:Transformer0.0K Cold

Qwen-7B-Chat is a 7-billion parameter instruction-tuned large language model developed by Alibaba Cloud, part of the Qwen (Tongyi Qianwen) series. Built on a Transformer architecture and pretrained on a diverse, large-scale dataset including web texts, books, and code, it excels as an AI assistant. The model features a 32768-token context length and demonstrates strong performance across Chinese and English understanding, coding, and mathematical reasoning benchmarks, with a particular strength in tool usage and efficient quantization options.

Loading preview...

tangger/Qwen-7B-Chat: An Instruction-Tuned AI Assistant

This model, tangger/Qwen-7B-Chat, is a 7-billion parameter instruction-tuned variant of Alibaba Cloud's Qwen (Tongyi Qianwen) large language model series. It is built on a Transformer architecture and was pretrained on an extensive and diverse dataset encompassing web texts, books, and code. The tangger version is a re-upload of the Qwen-7B-Chat model from September 11, provided as a temporary backup.

Key Capabilities & Features

  • Strong Multilingual Performance: Achieves competitive results on both Chinese (C-Eval: 54.2% Avg. Acc.) and English (MMLU: 53.9% Avg. Acc.) understanding benchmarks among models of similar scale.
  • Coding Proficiency: Demonstrates solid coding abilities, scoring 24.4 Pass@1 on HumanEval.
  • Mathematical Reasoning: Performs well on mathematical tasks, with 41.1% Zero-shot Acc. on GSM8K.
  • Extended Context Length: Supports a context length of 32768 tokens, with techniques like NTK-aware interpolation and LogN attention scaling for long-context understanding (e.g., 16.6 Rouge-L on VCSUM).
  • Advanced Tool Usage: Excels in tool-use capabilities, supporting ReAct Prompting with 99% Tool Selection accuracy and low false positive rates, and functions effectively as a HuggingFace Agent.
  • Efficient Quantization: Offers an Int4 quantized version (Qwen/Qwen-7B-Chat-Int4) that provides nearly lossless performance with improved inference speed and reduced memory usage (e.g., 45.60 tokens/s for 2048 tokens vs. 30.53 for BF16).
  • Optimized Tokenization: Utilizes a 150K+ token vocabulary based on tiktoken, optimized for efficient encoding of Chinese, English, and code, with multilingual friendliness.

Good for

  • Developing AI assistants requiring strong conversational and reasoning abilities.
  • Applications needing robust performance in both Chinese and English language tasks.
  • Code generation and mathematical problem-solving.
  • Scenarios demanding efficient processing of long text contexts.
  • Integrating with external tools and APIs via ReAct prompting or as a HuggingFace Agent, especially where high tool selection accuracy is critical.
  • Deployment in environments with memory constraints, leveraging its efficient Int4 quantization.