Name: zai-org/LongReward-llama3.1-8b-DPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zai-org

Overview

LongReward-llama3.1-8b-DPO is an 8 billion parameter language model from THUDM, fine-tuned using Direct Preference Optimization (DPO) on the dpo_llama3.1_8b split of the LongReward-10k dataset. This model is built upon the Llama 3.1 architecture and is designed to excel in tasks requiring a deep understanding of long contexts.

Key Capabilities

Extended Context Window: Supports a maximum context window of up to 64K tokens, significantly enhancing its ability to process and generate long-form content.
DPO Fine-tuning: Leverages DPO training on a specialized long-context preference dataset, which improves its performance in generating coherent and relevant responses over extended inputs.
Llama 3.1 Base: Benefits from the robust capabilities of the Llama 3.1 foundational model.

Good For

Long-form Question Answering: Answering complex queries that require synthesizing information from very long documents or conversations.
Document Summarization: Generating concise summaries from extensive texts.
Contextual Chatbots: Developing conversational agents that maintain context over prolonged interactions.
Information Extraction: Extracting specific details from large bodies of text where relevant information might be spread out.

For more technical details, refer to the LongReward Paper and the GitHub Repository.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)