Name: W-61/mistral-7b-base-beta-dpo-hh-helpful-4xh200-batch-64 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, W-61/mistral-7b-base-beta-dpo-hh-helpful-4xh200-batch-64, is a 7 billion parameter language model developed by W-61. It is a fine-tuned version of a Mistral-7B base model, specifically enhanced through Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset. This training methodology aims to align the model's outputs with human preferences for helpfulness and harmlessness.

Key Capabilities

Preference Alignment: Optimized using DPO on the Anthropic/hh-rlhf dataset to produce responses that are both helpful and harmless.
Base Architecture: Built upon the Mistral-7B architecture, known for its efficiency and strong performance in its size class.
Context Window: Supports a context length of 4096 tokens, allowing for processing moderately long inputs.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07, a total batch size of 64, and utilized a cosine learning rate scheduler with a 0.1 warmup ratio. Evaluation metrics during training, such as a final loss of 0.6015 and specific Beta DPO metrics, indicate its performance in aligning with the preference dataset.

Good For

Conversational AI: Ideal for chatbots and virtual assistants where generating helpful, safe, and human-aligned responses is critical.
Content Moderation: Can be used in applications requiring adherence to specific safety guidelines.
Research: Suitable for researchers exploring DPO techniques and preference alignment on Mistral-7B models.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)