CorticalStack/mistral-7b-distilabel-truthy-dpo
CorticalStack/mistral-7b-distilabel-truthy-dpo is a 7 billion parameter language model, fine-tuned from Mistral-7B-v0.1 using the distilabel-truthy-dpo-v0.1 dataset. This model leverages DPO (Direct Preference Optimization) to enhance its truthfulness and alignment. It is designed for tasks requiring accurate and aligned responses, building upon the Mistral architecture.
Loading preview...
CorticalStack/mistral-7b-distilabel-truthy-dpo Overview
This model is a 7 billion parameter language model, specifically a DPO (Direct Preference Optimization) fine-tuned version of the original mistralai/Mistral-7B-v0.1 base model. The fine-tuning process utilized the mlabonne/distilabel-truthy-dpo-v0.1 dataset, indicating an emphasis on improving the model's truthfulness and alignment with preferred responses.
Key Characteristics
- Base Model: Mistral-7B-v0.1
- Fine-tuning Method: Direct Preference Optimization (DPO)
- Training Dataset:
mlabonne/distilabel-truthy-dpo-v0.1, suggesting a focus on truthfulness.
Training Configuration
The fine-tuning involved specific LoRA (Low-Rank Adaptation) and training arguments:
- LoRA Parameters:
r: 16LoRA alpha: 16LoRA dropout: 0.05
- Training Arguments:
Batch size: 4Gradient accumulation steps: 4Optimizer: paged_adamw_32bitMax steps: 100Learning rate: 5e-05Learning rate scheduler type: cosineBeta: 0.1Max prompt length: 1024Max length: 1536
Potential Use Cases
This model is suitable for applications where generating factually accurate and aligned text is crucial, benefiting from the DPO fine-tuning on a truth-focused dataset.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.