nbeerbower/bophades-mistral-truthy-DPO-7B
The nbeerbower/bophades-mistral-truthy-DPO-7B is a 7 billion parameter causal language model, fine-tuned from the bophades-v2-mistral-7B base model using Direct Preference Optimization (DPO). This model leverages the jondurbin/truthy-dpo-v0.1 dataset to enhance its truthfulness and alignment. It is optimized for generating responses that adhere to preferred outputs, making it suitable for applications requiring high-fidelity and aligned text generation.
Loading preview...
Model Overview
nbeerbower/bophades-mistral-truthy-DPO-7B is a 7 billion parameter language model built upon the bophades-v2-mistral-7B architecture. This model has undergone a fine-tuning process using Direct Preference Optimization (DPO) on the jondurbin/truthy-dpo-v0.1 dataset.
Key Characteristics
- Base Model: Fine-tuned from bophades-v2-mistral-7B.
- Fine-tuning Method: Utilizes Direct Preference Optimization (DPO) for alignment.
- Training Data: Leverages the
truthy-dpo-v0.1dataset, suggesting an emphasis on generating factually consistent or preferred responses. - Training Environment: Fine-tuned on an A100 GPU via Google Colab.
Technical Configuration
The DPO training involved specific LoRA and model settings:
- LoRA Configuration:
r=16,lora_alpha=16,lora_dropout=0.05, targeting key attention and feed-forward layers. - Training Parameters:
per_device_train_batch_size=2,gradient_accumulation_steps=2,learning_rate=2e-5,max_steps=420. - Context Length: Configured with
max_prompt_length=1024andmax_length=1536for DPO training.
Potential Use Cases
This model is particularly suited for applications where generating aligned, truthful, or preference-driven text is crucial, benefiting from its DPO fine-tuning on a truth-focused dataset.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.