W-61/mistral-7b-base-beta-dpo-hh-harmless-4xh200-batch-64

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 18, 2026Architecture:Transformer Cold

W-61/mistral-7b-base-beta-dpo-hh-harmless-4xh200-batch-64 is a 7 billion parameter Mistral-based language model fine-tuned on the Anthropic/hh-rlhf dataset. This model specializes in generating harmless and helpful responses, leveraging Direct Preference Optimization (DPO) for alignment. It is designed for applications requiring robust safety and adherence to ethical guidelines in conversational AI.

Loading preview...

Model Overview

This model, W-61/mistral-7b-base-beta-dpo-hh-harmless-4xh200-batch-64, is a 7 billion parameter language model built upon a Mistral base architecture. It has been specifically fine-tuned using the Direct Preference Optimization (DPO) method on the Anthropic/hh-rlhf dataset. The primary goal of this fine-tuning was to enhance the model's ability to produce harmless and helpful outputs, aligning with safety and ethical standards.

Key Characteristics

  • Base Model: Mistral-7B
  • Fine-tuning Method: Direct Preference Optimization (DPO)
  • Dataset: Anthropic/hh-rlhf, focusing on harmlessness and helpfulness.
  • Parameter Count: 7 billion parameters.
  • Context Length: 4096 tokens.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07, using a total batch size of 64 across 4 GPUs. Evaluation metrics, including DPO loss and margin means, indicate the model's progression in aligning with the preference data. The training process utilized Transformers 4.51.0 and PyTorch 2.3.1+cu121.

Intended Use Cases

This model is particularly well-suited for applications where generating safe, non-toxic, and helpful responses is paramount. Its DPO fine-tuning on a harmlessness dataset makes it a strong candidate for:

  • Safe conversational AI: Chatbots or virtual assistants requiring strict adherence to safety guidelines.
  • Content moderation: Assisting in filtering or generating benign text.
  • Ethical AI research: As a baseline for further alignment studies.