weqweasdas/zephyr-7b-dpo-full
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 30, 2024License:apache-2.0Architecture:Transformer Open Weights Cold
The weqweasdas/zephyr-7b-dpo-full is a 7 billion parameter language model fine-tuned from alignment-handbook/zephyr-7b-sft-full. This model leverages Direct Preference Optimization (DPO) on the HuggingFaceH4/ultrafeedback_binarized dataset, enhancing its ability to align with human preferences. It is optimized for generating responses that are preferred over rejected alternatives, making it suitable for conversational AI and instruction-following tasks. The model operates with a context length of 4096 tokens.
Loading preview...