Aratako/Llama-Gemma-2-27b-ORPO-iter3 is a 27 billion parameter instruction-tuned causal language model developed by Aratako. It is built upon a Llama and Gemma 2 base, further refined using ORPO (Optimized Reward Policy Optimization) after initial CPO_SimPO instruction tuning. This model is designed for general instruction following, leveraging its sophisticated fine-tuning process for enhanced performance.
No reviews yet. Be the first to review!