nbeerbower/bophades-mistral-truthy-DPO-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The nbeerbower/bophades-mistral-truthy-DPO-7B is a 7 billion parameter causal language model, fine-tuned from the bophades-v2-mistral-7B base model using Direct Preference Optimization (DPO). This model leverages the jondurbin/truthy-dpo-v0.1 dataset to enhance its truthfulness and alignment. It is optimized for generating responses that adhere to preferred outputs, making it suitable for applications requiring high-fidelity and aligned text generation.

Loading preview...

Model Overview

nbeerbower/bophades-mistral-truthy-DPO-7B is a 7 billion parameter language model built upon the bophades-v2-mistral-7B architecture. This model has undergone a fine-tuning process using Direct Preference Optimization (DPO) on the jondurbin/truthy-dpo-v0.1 dataset.

Key Characteristics

  • Base Model: Fine-tuned from bophades-v2-mistral-7B.
  • Fine-tuning Method: Utilizes Direct Preference Optimization (DPO) for alignment.
  • Training Data: Leverages the truthy-dpo-v0.1 dataset, suggesting an emphasis on generating factually consistent or preferred responses.
  • Training Environment: Fine-tuned on an A100 GPU via Google Colab.

Technical Configuration

The DPO training involved specific LoRA and model settings:

  • LoRA Configuration: r=16, lora_alpha=16, lora_dropout=0.05, targeting key attention and feed-forward layers.
  • Training Parameters: per_device_train_batch_size=2, gradient_accumulation_steps=2, learning_rate=2e-5, max_steps=420.
  • Context Length: Configured with max_prompt_length=1024 and max_length=1536 for DPO training.

Potential Use Cases

This model is particularly suited for applications where generating aligned, truthful, or preference-driven text is crucial, benefiting from its DPO fine-tuning on a truth-focused dataset.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p