Name: Ejafa/qwen2-0.5b-instruct-simpo-lr-5e-07-gamma-1.5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Ejafa

Model Overview

This model, Ejafa/qwen2-0.5b-instruct-simpo-lr-5e-07-gamma-1.5, is a fine-tuned version of the Qwen2-0.5B-Instruct architecture. Developed by Ejafa Bassam and Yaroslav Ponomarenko at Peking University, it was trained as part of the Reinforcement Learning - 24 project with a specific focus on the SIMPO (Simple Preference Optimization) method.

Key Characteristics

Base Model: Qwen/Qwen2-0.5B-Instruct.
Training Dataset: Fine-tuned on the princeton-nlp/llama3-ultrafeedback dataset.
Optimization Method: Utilizes the SIMPO approach for preference alignment.
Training Hyperparameters: Employed a learning rate of 5e-07, a total training batch size of 128, and a cosine learning rate scheduler over 1 epoch.

Evaluation Performance

During evaluation, the model achieved a loss of 1.6594. Key reward metrics include:

Rewards/accuracies: 0.5282
Rewards/margins: 0.1325
Rewards/chosen: -3.3473
Rewards/rejected: -3.4798

Intended Uses

This model is suitable for research and applications involving preference-based learning and instruction following, particularly where a compact model size (0.5B parameters) is beneficial. Its training on a feedback dataset suggests potential for tasks requiring alignment with human preferences.