chujiezheng/Llama3-8B-Chinese-Chat-ExPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 25, 2024License:llama3Architecture:Transformer0.0K Warm

The Llama3-8B-Chinese-Chat-ExPO model, developed by chujiezheng, is an 8 billion parameter language model with an 8192 token context length. It is an extrapolated (ExPO) version of the Llama3-8B-Chinese-Chat and Meta-Llama-3-8B-Instruct models, applying a novel extrapolation technique (alpha = 0.3) to enhance alignment with human preferences. This experimental model aims to improve performance in Chinese language tasks through its unique extrapolation method, showing win rate improvements on benchmarks like AlpacaEval 2.0 and MT-Bench.

Loading preview...

Model Overview

chujiezheng/Llama3-8B-Chinese-Chat-ExPO is an 8 billion parameter language model derived from shenzhi-wang/Llama3-8B-Chinese-Chat and meta-llama/Meta-Llama-3-8B-Instruct. Its core innovation lies in the application of an extrapolation (ExPO) technique with an alpha value of 0.3, based on the "Weak-to-Strong Extrapolation Expedites Alignment" paper. This method aims to achieve superior alignment with human preferences by extrapolating from SFT and DPO/RLHF checkpoints.

Key Characteristics

  • Extrapolation (ExPO) Method: Utilizes a unique extrapolation technique (alpha = 0.3) to enhance model alignment.
  • Experimental Chinese Focus: Specifically adapted for Chinese language, though noted as experimental with potential for unexpected issues.
  • Performance Improvements: Demonstrates improved win rates on the AlpacaEval 2.0 benchmark and higher scores on MT-Bench across various base models when the ExPO method is applied.

Evaluation Highlights

The model's effectiveness is showcased through evaluation results on standard benchmarks:

  • AlpacaEval 2.0: Consistently shows an increase in "Win Rate (+ ExPO)" and "LC Win Rate (+ ExPO)" compared to original models, indicating improved preference alignment.
  • MT-Bench: Achieves higher scores with the ExPO method, suggesting enhanced conversational quality and instruction following.

Considerations

This is an experimental model, and its Chinese language capabilities, while targeted for improvement, have not been comprehensively evaluated. Users should be aware that applying extrapolation to DPO/RLHF alignment training for new languages like Chinese may introduce unexpected behaviors.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p