Name: CEIA-RL/energy-exp1-dpo-offline API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: CEIA-RL

Overview

CEIA-RL/energy-exp1-dpo-offline is a 4 billion parameter language model, fine-tuned from the CEIA-RL/Energy base model. This model leverages the Direct Preference Optimization (DPO) method, as introduced in the paper "Direct Preference Optimization: Your Language Model is Secretly a Reward Model" (Rafailov et al., 2023). The training was conducted using the TRL (Transformers Reinforcement Learning) framework.

Key Capabilities

Preference-aligned Text Generation: Optimized through DPO to generate responses that align with human preferences, making it suitable for tasks requiring nuanced or preferred outputs.
Instruction Following: Capable of generating text based on user prompts, as demonstrated by the quick start example.
Large Context Window: Supports a context length of 32768 tokens, allowing for processing and generating longer sequences of text.

Training Details

The model was trained using the DPO method, which directly optimizes a language model to align with human preferences without requiring an explicit reward model. The training utilized specific versions of key frameworks:

TRL: 0.29.0
Transformers: 4.57.6
Pytorch: 2.10.0
Datasets: 4.7.0
Tokenizers: 0.22.2

Good for

Generating preferred responses in interactive AI applications.
Tasks where output quality and alignment with human judgment are critical.
Exploring DPO-based fine-tuning for language models.

Overview

Overview

Key Capabilities

Training Details

Good for

Full Model Card (README)