Name: anakin87/gemma-2b-orpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: anakin87

Model Overview

anakin87/gemma-2b-orpo is a 2.6 billion parameter language model developed by anakin87, fine-tuned from the google/gemma-2b base model. It utilizes the ORPO (Odds Ratio Preference Optimization) training paradigm, which integrates supervised fine-tuning and preference alignment into a single, more efficient process. This approach offers benefits such as faster training and lower memory consumption compared to traditional methods like DPO, as it does not require a reference model.

Key Capabilities & Performance

This model demonstrates competitive performance for its size, as evaluated on various benchmarks:

Nous Benchmark Suite: Achieves an average score of 39.45, outperforming mlabonne/Gemmalpaca-2B, google/gemma-2b-it, and the base google/gemma-2b model.
Open LLM Leaderboard: Records an average score of 47.35, which is higher than google/gemma-2b-it's average of 42.75.
Specific scores include 49.15 on AI2 Reasoning Challenge and 73.72 on HellaSwag.

Training Details

The model was trained using the alvarobartt/dpo-mix-7k-simplified dataset, a streamlined version of argilla/dpo-mix-7k. Training was conducted using the Hugging Face TRL framework. A quantized GGUF version is also available for efficient deployment.

Usage

This model is suitable for general text generation tasks and can run smoothly on environments like Colab, even with quantization. Example usage for text generation with the Transformers library is provided, including a notebook for chat and RAG applications using Haystack.

Overview

Model Overview

Key Capabilities & Performance

Training Details

Usage

Full Model Card (README)