CorticalStack/mistral-7b-distilabel-truthy-dpo

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Mar 5, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

CorticalStack/mistral-7b-distilabel-truthy-dpo is a 7 billion parameter language model, fine-tuned from Mistral-7B-v0.1 using the distilabel-truthy-dpo-v0.1 dataset. This model leverages DPO (Direct Preference Optimization) to enhance its truthfulness and alignment. It is designed for tasks requiring accurate and aligned responses, building upon the Mistral architecture.

Loading preview...

CorticalStack/mistral-7b-distilabel-truthy-dpo Overview

This model is a 7 billion parameter language model, specifically a DPO (Direct Preference Optimization) fine-tuned version of the original mistralai/Mistral-7B-v0.1 base model. The fine-tuning process utilized the mlabonne/distilabel-truthy-dpo-v0.1 dataset, indicating an emphasis on improving the model's truthfulness and alignment with preferred responses.

Key Characteristics

  • Base Model: Mistral-7B-v0.1
  • Fine-tuning Method: Direct Preference Optimization (DPO)
  • Training Dataset: mlabonne/distilabel-truthy-dpo-v0.1, suggesting a focus on truthfulness.

Training Configuration

The fine-tuning involved specific LoRA (Low-Rank Adaptation) and training arguments:

  • LoRA Parameters:
    • r: 16
    • LoRA alpha: 16
    • LoRA dropout: 0.05
  • Training Arguments:
    • Batch size: 4
    • Gradient accumulation steps: 4
    • Optimizer: paged_adamw_32bit
    • Max steps: 100
    • Learning rate: 5e-05
    • Learning rate scheduler type: cosine
    • Beta: 0.1
    • Max prompt length: 1024
    • Max length: 1536

Potential Use Cases

This model is suitable for applications where generating factually accurate and aligned text is crucial, benefiting from the DPO fine-tuning on a truth-focused dataset.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p