Columbia-NLP/LION-Gemma-2b-odpo-v1.0

TEXT GENERATIONConcurrency Cost:1Model Size:2.5BQuant:BF16Ctx Length:8kPublished:Jun 28, 2024Architecture:Transformer0.0K Cold

LION-Gemma-2b-odpo-v1.0 is a 2.5 billion parameter language model developed by Columbia-NLP, fine-tuned from Gemma-2b using an empirically optimized three-stage pipeline including SFT, DPO, and online DPO. This model demonstrates improved performance over official instruct models, excelling in various benchmarks such as Arena-Hard, AlpacaEval-2, MT-Bench, and OpenLLM. It is primarily English-language focused and designed for general conversational and instruction-following tasks.

Loading preview...

Model Overview

LION-Gemma-2b-odpo-v1.0 is a 2.5 billion parameter language model from Columbia-NLP, part of the LION-series, which are developed using an empirically optimized three-stage alignment pipeline. This pipeline incorporates Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and an innovative online DPO (ODPO) training method. The model is fine-tuned from Columbia-NLP/LION-Gemma-2b-dpo-v1.0, which itself is based on Google's Gemma-2b architecture.

Key Capabilities & Performance

This model is notable for its enhanced performance compared to official instruct models, even those tuned with closed-source data and algorithms. The LION pipeline utilizes techniques such as sequence packing, loss masking in SFT, and increasing preference dataset size in DPO to achieve superior results. Benchmarking against other 2B and 7B parameter models, LION-Gemma-2b-odpo-v1.0 shows strong scores:

  • Arena-Hard: 5.0
  • AlpacaEval-2: 9.57
  • MT-Bench: 6.75
  • OpenLLM: 55.98

These scores indicate its effectiveness in instruction following and general conversational tasks, often outperforming larger models like LLaMA-2-7b-chat and Vicuna-7b-v1.5 in specific metrics.

Intended Uses

LION-Gemma-2b-odpo-v1.0 is primarily intended for general-purpose instruction following and text generation in English. Developers can integrate it using standard Hugging Face transformers pipelines, ensuring reproducibility by adhering to the specified chat template. The model's training details, code, and evaluation scripts are available in the associated paper and codebase.