princeton-nlp/Mistral-7B-Base-SFT-RDPO
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:May 17, 2024Architecture:Transformer Cold
princeton-nlp/Mistral-7B-Base-SFT-RDPO is a 7 billion parameter language model developed by princeton-nlp, based on the Mistral architecture. This model is specifically fine-tuned using the SimPO (Simple Preference Optimization with a Reference-Free Reward) method, as detailed in their research preprint. Its primary differentiator lies in its advanced preference optimization technique, making it suitable for tasks requiring nuanced understanding and generation based on human preferences.
Loading preview...