alvarobartt/Mistral-7B-v0.1-ORPO
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 21, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold
alvarobartt/Mistral-7B-v0.1-ORPO is a 7 billion parameter language model fine-tuned by alvarobartt using the ORPO (Odds Ratio Preference Optimization) method on the Mistral-7B-v0.1 base model. This model leverages a single-stage preference optimization technique, combining supervised fine-tuning and reinforcement learning from human feedback, which makes it faster to train and less memory-intensive than traditional DPO/PPO methods. It is particularly optimized for tasks requiring preference-based fine-tuning, achieving strong performance on various benchmarks with a 4096 token context length.
Loading preview...