Model Overview
nbeerbower/llama-3-bophades-v3-8B is an 8 billion parameter model built upon the Llama-3-8B architecture. It has been fine-tuned using Direct Preference Optimization (DPO) to enhance its performance in specific domains.
Key Capabilities
- Enhanced Truthfulness: Fine-tuned on the
jondurbin/truthy-dpo-v0.1 dataset to improve the factual accuracy of its responses. - Mathematical Reasoning: Leverages the
kyujinpy/orca_math_dpo dataset to strengthen its ability to solve mathematical problems. - DPO Fine-tuning: Utilizes Direct Preference Optimization for alignment, aiming to produce more helpful and harmless outputs.
Training Details
The model was fine-tuned on an A100 GPU using Google Colab. The DPO training process involved specific configurations for LoRA (r=16, lora_alpha=16, lora_dropout=0.05) and training arguments (learning_rate=5e-5, max_steps=1000). The dataset preparation involved concatenating and formatting the truthy-dpo-v0.1 and orca_math_dpo datasets into a ChatML-like format for DPO training, with a max_prompt_length of 2048 and max_length of 4096.
Ideal Use Cases
This model is particularly well-suited for applications where high factual accuracy and strong mathematical problem-solving are critical. It can be beneficial for tasks such as generating accurate summaries, answering factual questions, and assisting with mathematical computations.