shenzhi-wang/Llama3.1-70B-Chinese-Chat

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Jul 25, 2024License:llama3.1Architecture:Transformer0.0K Warm

shenzhi-wang/Llama3.1-70B-Chinese-Chat is a 70 billion parameter instruction-tuned language model developed by Shenzhi Wang and Yaowei Zheng, built upon Meta-Llama-3.1-70B-Instruct. This model is specifically fine-tuned for Chinese and English users, excelling in roleplay, function calling, and mathematical capabilities. It leverages the ORPO fine-tuning algorithm and supports a 32K context length, making it suitable for diverse conversational and technical applications.

Loading preview...

Overview

shenzhi-wang/Llama3.1-70B-Chinese-Chat is a 70 billion parameter instruction-tuned language model developed by Shenzhi Wang and Yaowei Zheng, built upon the Meta-Llama-3.1-70B-Instruct base model. It is specifically fine-tuned for both Chinese and English users, enhancing its utility in bilingual environments. The model utilizes the ORPO fine-tuning algorithm, a reference-free monolithic preference optimization method, to achieve its specialized capabilities.

Key Capabilities

  • Bilingual Support: Optimized for both Chinese and English users.
  • Enhanced Roleplay: Demonstrates significant improvements in roleplaying scenarios.
  • Function Calling: Features enhanced capabilities for function calling, useful for tool-use applications.
  • Mathematical Proficiency: Exhibits improved performance in mathematical tasks.
  • Context Length: Supports a context length of 32,768 tokens, inherited from its base model.

Training Details

The model was fine-tuned over 3 epochs with a learning rate of 1.5e-6 and a cosine learning rate scheduler. It used a cutoff length of 8192 and an ORPO beta of 0.05, with full parameter fine-tuning and a paged_adamw_32bit optimizer. The training dataset included over 100,000 preference pairs.

Good for

  • Applications requiring strong bilingual (Chinese/English) conversational abilities.
  • Use cases involving complex roleplaying or character interactions.
  • Scenarios where robust function calling and tool integration are necessary.
  • Tasks demanding improved mathematical reasoning and problem-solving.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p