Model Overview
allenai/llama-3.1-tulu-2-dpo-8b is an 8 billion parameter language model developed by AllenAI, building upon the Meta-Llama-3.1-8B architecture. It is part of the Tulu series, designed to function as a helpful assistant.
Key Characteristics & Training
This model undergoes a two-stage fine-tuning process:
- Initial Fine-tuning: Trained on a diverse blend of publicly available, synthetic, and human-created datasets, as detailed in the paper "Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2".
- DPO Alignment: Further aligned using Direct Preference Optimization (DPO) on the UltraFeedback dataset, which contains 64k prompts and GPT-4 ranked model completions. This DPO step aims to improve instruction following and helpfulness.
Performance Highlights
Compared to the base Llama 3.1 8B Instruct model, this DPO-tuned version shows notable improvements in specific benchmarks:
- TruthfulQA: Achieves 70.3% (%Info+True), significantly higher than Llama 3.1 8B Instruct's 31.1%.
- IFEval (loose acc): Scores 52.3%, an improvement over Llama 3.1 8B Instruct's 75.6% (Note: The table shows a decrease in IFEval compared to the instruct model, but an increase compared to the non-DPO Tulu 2 Llama 3.1 8b).
- Codex HumanEval Pass@10: Reaches 69.1%.
Intended Use
This model is intended for use as a helpful assistant, particularly in scenarios requiring robust instruction following and truthful responses. Users should format inputs using the <|user|> and <|assistant|> tags, ensuring a newline after <|assistant|> for optimal generation quality.