Ejafa/qwen2-0.5b-instruct-simpo-lr-5e-07-gamma-1.5
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Jun 21, 2024License:apache-2.0Architecture:Transformer Open Weights Warm

Ejafa/qwen2-0.5b-instruct-simpo-lr-5e-07-gamma-1.5 is a 0.5 billion parameter instruction-tuned Qwen2-0.5B-Instruct model, fine-tuned by Ejafa Bassam and Yaroslav Ponomarenko as part of the Reinforcement Learning - 24 project at Peking University. This model focuses on the SIMPO (Simple Preference Optimization) method and was trained on the princeton-nlp/llama3-ultrafeedback dataset. It is designed for tasks requiring preference optimization, demonstrating specific reward metrics on its evaluation set.

Loading preview...