Aratako/Llama-Gemma-2-27b-ORPO-iter3

Warm
Public
27B
FP8
32768
Dec 16, 2024
License: llama3.1
Hugging Face
Overview

Overview

Aratako/Llama-Gemma-2-27b-ORPO-iter3 is a 27 billion parameter instruction-tuned model developed by Aratako. It is based on the google/gemma-2-27b architecture and incorporates elements from Llama and Qwen. This model underwent a multi-stage fine-tuning process, starting with supervised instruction tuning and two iterations of CPO_SimPO, followed by an application of ORPO (Optimized Reward Policy Optimization).

Key Capabilities

  • Instruction Following: Enhanced through ORPO fine-tuning, making it suitable for various instruction-based tasks.
  • Iterative Refinement: Benefits from an iterative training approach, building upon Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter2.
  • Training Methodology: Utilizes axolotl for training, with specific configurations for ORPO, including orpo_alpha: 0.1 and a learning_rate: 8e-7.

Training Details

The model was trained using the Aratako/iterative-dpo-data-for-ORPO-iter3 dataset. The training process involved a max_prompt_len of 512 and a max_length of 2560, with a sequence_len of 2560. It was developed as part of a competition for the Matsuo Lab Large Language Model Course 2024.

Licensing

The model's usage is subject to several licenses due to its base models and training data:

  • META LLAMA 3.1 COMMUNITY LICENSE
  • Gemma Terms of Use
  • Qwen LICENSE AGREEMENT (requires attribution like "Built with Qwen")