zai-org/LongReward-llama3.1-8b-DPO
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Oct 22, 2024Architecture:Transformer0.0K Cold

LongReward-llama3.1-8b-DPO is an 8 billion parameter DPO-tuned causal language model developed by THUDM, based on the Llama 3.1 architecture. It is specifically optimized for long-context understanding and generation, supporting a maximum context window of up to 64K tokens. This model is fine-tuned using the LongReward-10k preference dataset, making it suitable for tasks requiring extensive contextual comprehension.

Loading preview...