Columbia-NLP/LION-LLaMA-3-8b-dpo-v1.0

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jun 28, 2024Architecture:Transformer0.0K Cold

Columbia-NLP/LION-LLaMA-3-8b-dpo-v1.0 is an 8 billion parameter LLaMA-3 based language model developed by Columbia-NLP. It is fine-tuned using a three-stage empirically optimized pipeline including SFT, DPO, and online preference learning, specifically focusing on DPO for this version. This model demonstrates performance exceeding official instruct models on benchmarks like MT-Bench and OpenLLM, making it suitable for general conversational AI and instruction-following tasks.

Loading preview...

Overview

Columbia-NLP/LION-LLaMA-3-8b-dpo-v1.0 is an 8 billion parameter language model, part of the LION-series developed by Columbia-NLP. It is fine-tuned from Columbia-NLP/LION-LLaMA-3-8b-sft-v1.0 using Direct Preference Optimization (DPO) as part of an empirically optimized three-stage pipeline (SFT, DPO, and online DPO). This model aims to improve performance through techniques like sequence packing and loss masking during SFT, and increasing preference dataset size in DPO.

Key Capabilities & Performance

  • Enhanced Alignment: Achieves strong alignment through its DPO fine-tuning stage, building upon an SFT base.
  • Competitive Benchmarks: Demonstrates competitive performance, with an MT-Bench score of 8.12 and an OpenLLM score of 71.28. It surpasses the official LLaMA-3-8b-it model in MT-Bench and OpenLLM scores.
  • Optimized Training: Benefits from an empirically optimized training pipeline designed to significantly improve language model performance.

Intended Use Cases

  • General Instruction Following: Designed for various instruction-following tasks, as indicated by its strong benchmark results.
  • Conversational AI: Suitable for generating human-like text in response to prompts, as shown in the provided chat template example.
  • Research and Development: Can be used by researchers interested in advanced alignment techniques and empirically optimized training pipelines. Further details on training datasets, code, and evaluation scripts are available in the associated paper and codebase.