Model Overview
anakin87/gemma-2b-orpo is a 2.6 billion parameter language model developed by anakin87, fine-tuned from the google/gemma-2b base model. It utilizes the ORPO (Odds Ratio Preference Optimization) training paradigm, which integrates supervised fine-tuning and preference alignment into a single, more efficient process. This approach offers benefits such as faster training and lower memory consumption compared to traditional methods like DPO, as it does not require a reference model.
Key Capabilities & Performance
This model demonstrates competitive performance for its size, as evaluated on various benchmarks:
- Nous Benchmark Suite: Achieves an average score of 39.45, outperforming
mlabonne/Gemmalpaca-2B, google/gemma-2b-it, and the base google/gemma-2b model. - Open LLM Leaderboard: Records an average score of 47.35, which is higher than
google/gemma-2b-it's average of 42.75. - Specific scores include 49.15 on AI2 Reasoning Challenge and 73.72 on HellaSwag.
Training Details
The model was trained using the alvarobartt/dpo-mix-7k-simplified dataset, a streamlined version of argilla/dpo-mix-7k. Training was conducted using the Hugging Face TRL framework. A quantized GGUF version is also available for efficient deployment.
Usage
This model is suitable for general text generation tasks and can run smoothly on environments like Colab, even with quantization. Example usage for text generation with the Transformers library is provided, including a notebook for chat and RAG applications using Haystack.