Columbia-NLP/LION-Gemma-2b-dpo-v1.0
LION-Gemma-2b-dpo-v1.0 is a 2.5 billion parameter Gemma-based causal language model developed by Columbia-NLP. It is fine-tuned using the LION pipeline's empirically optimized SFT and DPO stages, demonstrating improved performance over official instruct models. This model excels in general language understanding and generation tasks, achieving competitive benchmark scores for its size.
Loading preview...
Model Overview
Columbia-NLP's LION-Gemma-2b-dpo-v1.0 is a 2.5 billion parameter language model built upon the Gemma-2b architecture. It is a product of the LION-series training pipeline, which emphasizes an empirically optimized three-stage process: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and online preference learning (online DPO). This specific version is fine-tuned using SFT and DPO from the LION pipeline, building on the Columbia-NLP/LION-Gemma-2b-sft-v1.0 model.
Key Capabilities & Performance
The LION pipeline incorporates techniques such as sequence packing and loss masking in SFT, and increasing preference dataset size in DPO, which collectively enhance model performance. Benchmarks indicate that LION-Gemma-2b-dpo-v1.0 achieves strong results for its size, outperforming the official Gemma-2b-it model and other 2B parameter models on metrics like Arena-Hard (4.6), AlpacaEval-2 (8.75), MT-Bench (6.58), and OpenLLM (55.35). This suggests a robust capability in instruction following and general conversational tasks.
Intended Uses
This model is suitable for a variety of natural language processing applications requiring a compact yet capable instruction-tuned model. Its optimized training process makes it a strong candidate for tasks where efficient and high-quality responses are needed. Developers can integrate it using standard Hugging Face transformers pipelines, ensuring reproducibility with the provided chat template for optimal performance.