CorticalStack/mistral-7b-jondurbin-truthy-dpo
CorticalStack/mistral-7b-jondurbin-truthy-dpo is a 7 billion parameter language model, fine-tuned from Mistral-7B-v0.1 using the jondurbin/truthy-dpo-v0.1 dataset. This model leverages DPO (Direct Preference Optimization) to enhance its truthfulness and alignment, making it suitable for applications requiring reliable and accurate text generation. It maintains Mistral's 8192 token context length, focusing on improved factual consistency through its specialized training.
Loading preview...
CorticalStack/mistral-7b-jondurbin-truthy-dpo Overview
This model is a 7 billion parameter language model, derived from the foundational Mistral-7B-v0.1 architecture. Its primary distinction lies in its fine-tuning process, which utilizes Direct Preference Optimization (DPO) on the specific jondurbin/truthy-dpo-v0.1 dataset. This training methodology aims to improve the model's ability to generate factually consistent and truthful responses.
Key Training Details
- Base Model: mistralai/Mistral-7B-v0.1
- Fine-tuning Method: Direct Preference Optimization (DPO)
- Dataset: jondurbin/truthy-dpo-v0.1
- LoRA Configuration:
r: 16LoRA alpha: 16LoRA dropout: 0.05
- Training Arguments:
Batch size: 4Gradient accumulation steps: 4Optimizer: paged_adamw_32bitMax steps: 100Learning rate: 5e-05Learning rate scheduler type: cosineBeta: 0.1Max prompt length: 1024Max length: 1536
Intended Use Cases
This model is particularly well-suited for applications where the generation of truthful and aligned content is critical. The DPO fine-tuning with the truthy-dpo dataset suggests an emphasis on reducing factual errors and improving the reliability of generated text, making it a candidate for tasks requiring high factual accuracy.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.