MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.3
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 24, 2024License:llama3Architecture:Transformer0.0K Cold

MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.3 is an 8 billion parameter instruction-tuned language model, fine-tuned using Direct Preference Optimization (DPO) on the Meta-Llama-3-8B-Instruct base. This model features an extended context length of 32,000 tokens, achieved through `rope_theta` modification, significantly enhancing its ability to process longer inputs. It is designed for general instruction-following tasks, leveraging its DPO fine-tuning for improved response quality and adherence to user preferences.

Loading preview...