nbeerbower/llama-3-bophades-v2-8B
nbeerbower/llama-3-bophades-v2-8B is an 8 billion parameter language model based on Llama-3-8b, fine-tuned using Direct Preference Optimization (DPO). This model specializes in improving truthfulness and mathematical reasoning, building upon the llama-3-sauce-v1-8B base. It is optimized for tasks requiring accurate factual recall and robust numerical problem-solving.
Loading preview...
Overview
nbeerbower/llama-3-bophades-v2-8B is an 8 billion parameter large language model derived from the Llama-3-8b architecture. It was fine-tuned using Direct Preference Optimization (DPO) on a Google Colab environment with an A100 GPU. The base model, llama-3-sauce-v1-8B, was further trained on two specific datasets: jondurbin/truthy-dpo-v0.1 and kyujinpy/orca_math_dpo. This targeted fine-tuning aims to enhance the model's performance in areas related to factual accuracy and mathematical reasoning.
Key Capabilities
- Enhanced Truthfulness: Fine-tuned on a dataset designed to improve factual correctness and reduce hallucinations.
- Improved Mathematical Reasoning: Benefits from training on a dataset focused on mathematical problem-solving.
- Llama-3 Base: Leverages the strong foundational capabilities of the Llama-3-8b model.
- DPO Fine-tuning: Utilizes Direct Preference Optimization for aligning model outputs with desired human preferences.
Good For
- Applications requiring high factual accuracy.
- Tasks involving mathematical calculations and logical reasoning.
- Use cases where a robust 8B parameter model with improved truthfulness is beneficial.