nbeerbower/llama-3-bophades-v3-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kLicense:llama3Architecture:Transformer0.0K Warm

nbeerbower/llama-3-bophades-v3-8B is an 8 billion parameter language model based on Llama-3-8B, fine-tuned using Direct Preference Optimization (DPO). It was trained on a combination of jondurbin/truthy-dpo-v0.1 and kyujinpy/orca_math_dpo datasets, focusing on improving truthfulness and mathematical reasoning. This model is designed for tasks requiring accurate factual responses and robust mathematical problem-solving capabilities.

Loading preview...

Model Overview

nbeerbower/llama-3-bophades-v3-8B is an 8 billion parameter model built upon the Llama-3-8B architecture. It has been fine-tuned using Direct Preference Optimization (DPO) to enhance its performance in specific domains.

Key Capabilities

  • Enhanced Truthfulness: Fine-tuned on the jondurbin/truthy-dpo-v0.1 dataset to improve the factual accuracy of its responses.
  • Mathematical Reasoning: Leverages the kyujinpy/orca_math_dpo dataset to strengthen its ability to solve mathematical problems.
  • DPO Fine-tuning: Utilizes Direct Preference Optimization for alignment, aiming to produce more helpful and harmless outputs.

Training Details

The model was fine-tuned on an A100 GPU using Google Colab. The DPO training process involved specific configurations for LoRA (r=16, lora_alpha=16, lora_dropout=0.05) and training arguments (learning_rate=5e-5, max_steps=1000). The dataset preparation involved concatenating and formatting the truthy-dpo-v0.1 and orca_math_dpo datasets into a ChatML-like format for DPO training, with a max_prompt_length of 2048 and max_length of 4096.

Ideal Use Cases

This model is particularly well-suited for applications where high factual accuracy and strong mathematical problem-solving are critical. It can be beneficial for tasks such as generating accurate summaries, answering factual questions, and assisting with mathematical computations.