taozi555/MN-12B-Mag-Mell-R1-KTO
The taozi555/MN-12B-Mag-Mell-R1-KTO model is a 12 billion parameter language model, fine-tuned from inflatebot/MN-12B-Mag-Mell-R1. It was trained using the KTO (Kahneman-Tversky Optimization) method on the kto_rp dataset, achieving a training loss of 0.3763. This model is optimized for tasks benefiting from preference-based fine-tuning, demonstrating improved reward margins during training.
Loading preview...
Model Overview
The taozi555/MN-12B-Mag-Mell-R1-KTO is a 12 billion parameter language model, fine-tuned by taozi555. It is based on the inflatebot/MN-12B-Mag-Mell-R1 architecture and has been specifically optimized using the KTO (Kahneman-Tversky Optimization) method.
Key Characteristics
- Base Model: Fine-tuned from
inflatebot/MN-12B-Mag-Mell-R1. - Fine-tuning Method: Utilizes Kahneman-Tversky Optimization (KTO) on the
kto_rpdataset. - Performance Metrics: Achieved a training loss of 0.3763, with notable reward margins of 1.9446, indicating effective preference learning.
- Training Configuration: Trained with a learning rate of 5e-07, a batch size of 1 (total 32 with accumulation), and a cosine learning rate scheduler over 1 epoch.
Intended Use Cases
While specific intended uses are not detailed in the original model card, models fine-tuned with KTO are generally suitable for tasks requiring alignment with human preferences, such as:
- Generating responses that are preferred over rejected alternatives.
- Improving the quality and helpfulness of conversational AI.
- Tasks where nuanced preference learning is beneficial.