anakin87/gemma-2b-orpo
anakin87/gemma-2b-orpo is a 2.6 billion parameter language model fine-tuned from Google's Gemma-2B using the ORPO (Odds Ratio Preference Optimization) training paradigm. This model combines supervised fine-tuning and preference alignment for improved performance with faster training and reduced memory usage. It demonstrates strong performance for its size on benchmarks like Nous and Open LLM Leaderboard, making it suitable for general language generation tasks where efficiency is key.
Loading preview...
Model Overview
anakin87/gemma-2b-orpo is a 2.6 billion parameter language model developed by anakin87, fine-tuned from the google/gemma-2b base model. It utilizes the ORPO (Odds Ratio Preference Optimization) training paradigm, which integrates supervised fine-tuning and preference alignment into a single, more efficient process. This approach offers benefits such as faster training and lower memory consumption compared to traditional methods like DPO, as it does not require a reference model.
Key Capabilities & Performance
This model demonstrates competitive performance for its size, as evaluated on various benchmarks:
- Nous Benchmark Suite: Achieves an average score of 39.45, outperforming
mlabonne/Gemmalpaca-2B,google/gemma-2b-it, and the basegoogle/gemma-2bmodel. - Open LLM Leaderboard: Records an average score of 47.35, which is higher than
google/gemma-2b-it's average of 42.75. - Specific scores include 49.15 on AI2 Reasoning Challenge and 73.72 on HellaSwag.
Training Details
The model was trained using the alvarobartt/dpo-mix-7k-simplified dataset, a streamlined version of argilla/dpo-mix-7k. Training was conducted using the Hugging Face TRL framework. A quantized GGUF version is also available for efficient deployment.
Usage
This model is suitable for general text generation tasks and can run smoothly on environments like Colab, even with quantization. Example usage for text generation with the Transformers library is provided, including a notebook for chat and RAG applications using Haystack.