Name: W-61/mistral-7b-base-epsilon-dpo-hh-helpful-4xh200-batch-64 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, W-61/mistral-7b-base-epsilon-dpo-hh-helpful-4xh200-batch-64, is a 7 billion parameter language model derived from a Mistral-7B base. It has undergone a specific fine-tuning process using Epsilon DPO (Direct Preference Optimization) on the Anthropic/hh-rlhf dataset. This training methodology aims to enhance the model's helpfulness and alignment with human preferences.

Key Training Details

The fine-tuning process involved a learning rate of 5e-07, a total batch size of 64, and was run for 1 epoch. Evaluation metrics from the training show a final loss of 0.5823 and a rewards accuracy of 0.7038, indicating its performance in distinguishing preferred responses. The training utilized Transformers 4.51.0, Pytorch 2.3.1+cu121, Datasets 2.21.0, and Tokenizers 0.21.4.

Intended Use Cases

Given its fine-tuning on a helpfulness dataset, this model is particularly suited for applications where generating helpful, aligned, and preference-aware text is crucial. It can be considered for tasks requiring nuanced responses that prioritize user assistance and ethical considerations, leveraging the foundational capabilities of the Mistral-7B architecture.

Overview

Model Overview

Key Training Details

Intended Use Cases

Full Model Card (README)