The chujiezheng/Mistral7B-PairRM-SPPO-ExPO is a 7 billion parameter Mistral-based language model developed by chujiezheng, derived from UCLA-AGI/Mistral7B-PairRM-SPPO and mistralai/Mistral-7B-Instruct-v0.2. This model utilizes a weak-to-strong extrapolation (ExPO) method with an alpha of 0.3 to enhance alignment with human preferences. It achieves superior performance on benchmarks like AlpacaEval 2.0 and MT-Bench, making it suitable for tasks requiring improved human preference alignment.
No reviews yet. Be the first to review!