01-ai/Yi-34B

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Nov 1, 2023License:apache-2.0Architecture:Transformer1.3K Open Weights Cold

The Yi-34B is a 34 billion parameter open-source large language model developed by 01.AI, built on the Transformer architecture. It is trained on a 3T multilingual corpus, excelling in bilingual language understanding, common-sense reasoning, and reading comprehension. The model demonstrates strong performance in both English and Chinese benchmarks, making it suitable for a wide range of general-purpose language tasks.

Loading preview...

Overview

The Yi-34B is a 34 billion parameter large language model from the Yi series, developed by 01.AI. It is built upon the Transformer architecture, similar to Llama models, but is not a derivative, having been trained from scratch on proprietary high-quality datasets and infrastructure. The model is designed to be bilingual, trained on a 3T multilingual corpus, and offers a context length of 32768 tokens.

Key Capabilities

  • Bilingual Proficiency: Excels in both English and Chinese language understanding and generation.
  • Strong Benchmarks: The Yi-34B-Chat model achieved second place on the AlpacaEval Leaderboard (Jan 2024) and the Yi-34B base model ranked first among open-source models on the Hugging Face Open LLM Leaderboard and C-Eval (Nov 2023).
  • Extended Context Window: Features a 32K context length, with a 200K version available for handling very long texts, demonstrating enhanced performance in "Needle-in-a-Haystack" tests.
  • Quantization Support: Available in 4-bit (AWQ) and 8-bit (GPTQ) quantized versions, enabling deployment on consumer-grade GPUs.

Use Cases

  • General-purpose applications: Suitable for a broad range of tasks requiring strong language understanding and generation.
  • Bilingual applications: Ideal for scenarios demanding high performance in both English and Chinese.
  • Long-context tasks: The 200K context version is particularly effective for processing and reasoning over extensive documents.
  • Resource-constrained environments: Quantized models allow for deployment on hardware with limited VRAM, such as RTX 3090 or 4090 GPUs.