01-ai/Yi-34B-Chat

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Nov 22, 2023License:apache-2.0Architecture:Transformer0.4K Open Weights Cold

The Yi-34B-Chat model by 01.AI is a 34 billion parameter, instruction-tuned large language model built on the Transformer architecture, designed for bilingual (English and Chinese) applications. It features a 32K token context length and excels in language understanding, commonsense reasoning, and reading comprehension, outperforming many larger open-source models in various benchmarks. This model is particularly optimized for chat-based interactions and is suitable for personal, academic, and commercial use, offering a cost-effective solution with emergent abilities.

Loading preview...

Yi-34B-Chat: A Powerful Bilingual LLM by 01.AI

The Yi-34B-Chat model is a 34 billion parameter, instruction-tuned large language model developed by 01.AI. Built on the Transformer architecture, it is trained on a 3 trillion token multilingual corpus, making it highly proficient in both English and Chinese. This model has demonstrated strong performance, ranking second on the AlpacaEval Leaderboard (as of January 2024) and outperforming many other LLMs, including GPT-4 and Mixtral, in various benchmarks.

Key Capabilities

  • Bilingual Proficiency: Excels in both English and Chinese language understanding and generation.
  • Strong Reasoning: Shows promise in commonsense reasoning and reading comprehension tasks.
  • High Performance: Achieves top rankings in benchmarks like MMLU, CMMLU, BBH, and GSM8k among open-source models.
  • Extended Context: Supports a 32K token context length, with base models offering up to 200K tokens.
  • Quantized Versions: Available in 4-bit (AWQ) and 8-bit (GPTQ) quantized versions for deployment on consumer-grade GPUs.

Good For

  • Chat Applications: Optimized for diverse and coherent responses in conversational AI scenarios.
  • Bilingual Projects: Ideal for applications requiring strong performance in both English and Chinese.
  • Resource-Constrained Deployment: Quantized versions allow for efficient deployment on hardware with limited VRAM.
  • Fine-tuning: Provides a robust base for further fine-tuning for specific use cases, with guidance for both base and chat model fine-tuning.