wxzhang/selective-pairrm-33045197-mt0
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 21, 2024License:apache-2.0Architecture:Transformer Open Weights Cold
The wxzhang/selective-pairrm-33045197-mt0 model is a 7 billion parameter instruction-tuned language model, fine-tuned by wxzhang, based on Mistral-7B-Instruct-v0.2. This model was trained using a DPO (Direct Preference Optimization) approach on the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset, achieving a rewards accuracy of 0.6055 on the evaluation set. It is designed to generate responses aligned with human preferences, making it suitable for tasks requiring nuanced understanding and preferred output generation.
Loading preview...