Overview
Aratako/Llama-Gemma-2-27b-ORPO-iter3 is a 27 billion parameter instruction-tuned model developed by Aratako. It is based on the google/gemma-2-27b architecture and incorporates elements from Llama and Qwen. This model underwent a multi-stage fine-tuning process, starting with supervised instruction tuning and two iterations of CPO_SimPO, followed by an application of ORPO (Optimized Reward Policy Optimization).
Key Capabilities
- Instruction Following: Enhanced through ORPO fine-tuning, making it suitable for various instruction-based tasks.
- Iterative Refinement: Benefits from an iterative training approach, building upon
Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter2. - Training Methodology: Utilizes
axolotlfor training, with specific configurations for ORPO, includingorpo_alpha: 0.1and alearning_rate: 8e-7.
Training Details
The model was trained using the Aratako/iterative-dpo-data-for-ORPO-iter3 dataset. The training process involved a max_prompt_len of 512 and a max_length of 2560, with a sequence_len of 2560. It was developed as part of a competition for the Matsuo Lab Large Language Model Course 2024.
Licensing
The model's usage is subject to several licenses due to its base models and training data:
- META LLAMA 3.1 COMMUNITY LICENSE
- Gemma Terms of Use
- Qwen LICENSE AGREEMENT (requires attribution like "Built with Qwen")