Name: nbeerbower/bophades-mistral-truthy-DPO-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nbeerbower

Model Overview

nbeerbower/bophades-mistral-truthy-DPO-7B is a 7 billion parameter language model built upon the bophades-v2-mistral-7B architecture. This model has undergone a fine-tuning process using Direct Preference Optimization (DPO) on the jondurbin/truthy-dpo-v0.1 dataset.

Key Characteristics

Base Model: Fine-tuned from bophades-v2-mistral-7B.
Fine-tuning Method: Utilizes Direct Preference Optimization (DPO) for alignment.
Training Data: Leverages the truthy-dpo-v0.1 dataset, suggesting an emphasis on generating factually consistent or preferred responses.
Training Environment: Fine-tuned on an A100 GPU via Google Colab.

Technical Configuration

The DPO training involved specific LoRA and model settings:

LoRA Configuration: r=16, lora_alpha=16, lora_dropout=0.05, targeting key attention and feed-forward layers.
Training Parameters: per_device_train_batch_size=2, gradient_accumulation_steps=2, learning_rate=2e-5, max_steps=420.
Context Length: Configured with max_prompt_length=1024 and max_length=1536 for DPO training.

Potential Use Cases

This model is particularly suited for applications where generating aligned, truthful, or preference-driven text is crucial, benefiting from its DPO fine-tuning on a truth-focused dataset.

Overview

Model Overview

Key Characteristics

Technical Configuration

Potential Use Cases

Full Model Card (README)