Aratako/Llama-Gemma-2-27b-ORPO-iter3

TEXT GENERATIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kPublished:Dec 16, 2024License:llama3.1Architecture:Transformer0.0K Cold

Aratako/Llama-Gemma-2-27b-ORPO-iter3 is a 27 billion parameter instruction-tuned causal language model developed by Aratako. It is built upon a Llama and Gemma 2 base, further refined using ORPO (Optimized Reward Policy Optimization) after initial CPO_SimPO instruction tuning. This model is designed for general instruction following, leveraging its sophisticated fine-tuning process for enhanced performance.

Loading preview...

Overview

Aratako/Llama-Gemma-2-27b-ORPO-iter3 is a 27 billion parameter instruction-tuned model developed by Aratako. It is based on the google/gemma-2-27b architecture and incorporates elements from Llama and Qwen. This model underwent a multi-stage fine-tuning process, starting with supervised instruction tuning and two iterations of CPO_SimPO, followed by an application of ORPO (Optimized Reward Policy Optimization).

Key Capabilities

  • Instruction Following: Enhanced through ORPO fine-tuning, making it suitable for various instruction-based tasks.
  • Iterative Refinement: Benefits from an iterative training approach, building upon Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter2.
  • Training Methodology: Utilizes axolotl for training, with specific configurations for ORPO, including orpo_alpha: 0.1 and a learning_rate: 8e-7.

Training Details

The model was trained using the Aratako/iterative-dpo-data-for-ORPO-iter3 dataset. The training process involved a max_prompt_len of 512 and a max_length of 2560, with a sequence_len of 2560. It was developed as part of a competition for the Matsuo Lab Large Language Model Course 2024.

Licensing

The model's usage is subject to several licenses due to its base models and training data:

  • META LLAMA 3.1 COMMUNITY LICENSE
  • Gemma Terms of Use
  • Qwen LICENSE AGREEMENT (requires attribution like "Built with Qwen")