ArliAI-Llama-3-8B-Instruct-DPO-v0.2 Overview

This model is an 8 billion parameter instruction-tuned variant derived from Meta's Llama-3-8B-Instruct architecture, designed with an 8192 token context window. It represents a new iteration, specifically version 0.2, which addresses and corrects tokenization issues identified in its predecessor.

Key Training Details

The model has undergone Direct Preference Optimization (DPO) training. The primary dataset used for this DPO phase is the mlabonne/orpo-dpo-mix-40k dataset. The developer notes that despite this DPO training, the model's performance on open LLM benchmarks is currently lower than anticipated, suggesting a potential incompatibility or challenge with the chosen dataset and Llama 3 base.

Instruct Format

It adheres to a specific instruct format for interaction:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Available Formats

Quantized versions are available, including FP16 and GGUF formats, for broader deployment and accessibility.