CorticalStack/mistral-7b-jondurbin-truthy-dpo

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Mar 5, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

CorticalStack/mistral-7b-jondurbin-truthy-dpo is a 7 billion parameter language model, fine-tuned from Mistral-7B-v0.1 using the jondurbin/truthy-dpo-v0.1 dataset. This model leverages DPO (Direct Preference Optimization) to enhance its truthfulness and alignment, making it suitable for applications requiring reliable and accurate text generation. It maintains Mistral's 8192 token context length, focusing on improved factual consistency through its specialized training.

Loading preview...

CorticalStack/mistral-7b-jondurbin-truthy-dpo Overview

This model is a 7 billion parameter language model, derived from the foundational Mistral-7B-v0.1 architecture. Its primary distinction lies in its fine-tuning process, which utilizes Direct Preference Optimization (DPO) on the specific jondurbin/truthy-dpo-v0.1 dataset. This training methodology aims to improve the model's ability to generate factually consistent and truthful responses.

Key Training Details

  • Base Model: mistralai/Mistral-7B-v0.1
  • Fine-tuning Method: Direct Preference Optimization (DPO)
  • Dataset: jondurbin/truthy-dpo-v0.1
  • LoRA Configuration:
    • r: 16
    • LoRA alpha: 16
    • LoRA dropout: 0.05
  • Training Arguments:
    • Batch size: 4
    • Gradient accumulation steps: 4
    • Optimizer: paged_adamw_32bit
    • Max steps: 100
    • Learning rate: 5e-05
    • Learning rate scheduler type: cosine
    • Beta: 0.1
    • Max prompt length: 1024
    • Max length: 1536

Intended Use Cases

This model is particularly well-suited for applications where the generation of truthful and aligned content is critical. The DPO fine-tuning with the truthy-dpo dataset suggests an emphasis on reducing factual errors and improving the reliability of generated text, making it a candidate for tasks requiring high factual accuracy.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p