Name: CorticalStack/mistral-7b-distilabel-truthy-dpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: CorticalStack

CorticalStack/mistral-7b-distilabel-truthy-dpo Overview

This model is a 7 billion parameter language model, specifically a DPO (Direct Preference Optimization) fine-tuned version of the original mistralai/Mistral-7B-v0.1 base model. The fine-tuning process utilized the mlabonne/distilabel-truthy-dpo-v0.1 dataset, indicating an emphasis on improving the model's truthfulness and alignment with preferred responses.

Key Characteristics

Base Model: Mistral-7B-v0.1
Fine-tuning Method: Direct Preference Optimization (DPO)
Training Dataset: mlabonne/distilabel-truthy-dpo-v0.1, suggesting a focus on truthfulness.

Training Configuration

The fine-tuning involved specific LoRA (Low-Rank Adaptation) and training arguments:

LoRA Parameters:
- r: 16
- LoRA alpha: 16
- LoRA dropout: 0.05
Training Arguments:
- Batch size: 4
- Gradient accumulation steps: 4
- Optimizer: paged_adamw_32bit
- Max steps: 100
- Learning rate: 5e-05
- Learning rate scheduler type: cosine
- Beta: 0.1
- Max prompt length: 1024
- Max length: 1536

Potential Use Cases

This model is suitable for applications where generating factually accurate and aligned text is crucial, benefiting from the DPO fine-tuning on a truth-focused dataset.

Overview

CorticalStack/mistral-7b-distilabel-truthy-dpo Overview

Key Characteristics

Training Configuration

Potential Use Cases

Full Model Card (README)