alpindale/WizardLM-2-8x22B
Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:141BQuant:FP8Ctx Length:32kPublished:Apr 16, 2024License:apache-2.0Architecture:Transformer0.4K Open Weights Warm

WizardLM-2 8x22B is a 141 billion parameter Mixture of Experts (MoE) large language model developed by WizardLM@Microsoft AI, built upon the Mixtral-8x22B-v0.1 base model. It is designed for complex chat, multilingual interactions, reasoning, and agent tasks, demonstrating highly competitive performance against leading proprietary models. This multilingual model excels in human preference evaluations across writing, coding, math, and reasoning, making it suitable for advanced conversational AI applications.

Loading preview...

WizardLM-2 8x22B: Advanced Multilingual MoE Model

WizardLM-2 8x22B is the most advanced model in the WizardLM-2 family, developed by WizardLM@Microsoft AI. This 141 billion parameter Mixture of Experts (MoE) model is built on the mistral-community/Mixtral-8x22B-v0.1 base and is designed for superior performance in complex chat, multilingual communication, reasoning, and agent-based applications.

Key Capabilities & Performance

  • Competitive Performance: Demonstrates highly competitive performance against leading proprietary models and consistently outperforms existing state-of-the-art open-source models.
  • Multilingual Support: Engineered for robust performance across multiple languages.
  • Human Preferences: Achieves strong results in human preference evaluations, performing just slightly behind GPT-4-1106-preview and significantly stronger than Command R Plus and GPT4-0314 across tasks like writing, coding, math, reasoning, and agent interactions.
  • MT-Bench Evaluation: Shows highly competitive scores on the automatic MT-Bench evaluation framework.

Training Methodology

The model was trained using a fully AI-powered synthetic training system, a novel approach detailed in the WizardLM-2 release blog post.

Usage Notes

WizardLM-2 adopts the Vicuna prompt format for multi-turn conversations. Users should follow the specified prompt structure for optimal interaction.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p