taozi555/MN-12B-Mag-Mell-R1-KTO

TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Oct 5, 2025License:otherArchitecture:Transformer0.0K Cold

The taozi555/MN-12B-Mag-Mell-R1-KTO model is a 12 billion parameter language model, fine-tuned from inflatebot/MN-12B-Mag-Mell-R1. It was trained using the KTO (Kahneman-Tversky Optimization) method on the kto_rp dataset, achieving a training loss of 0.3763. This model is optimized for tasks benefiting from preference-based fine-tuning, demonstrating improved reward margins during training.

Loading preview...

Model Overview

The taozi555/MN-12B-Mag-Mell-R1-KTO is a 12 billion parameter language model, fine-tuned by taozi555. It is based on the inflatebot/MN-12B-Mag-Mell-R1 architecture and has been specifically optimized using the KTO (Kahneman-Tversky Optimization) method.

Key Characteristics

  • Base Model: Fine-tuned from inflatebot/MN-12B-Mag-Mell-R1.
  • Fine-tuning Method: Utilizes Kahneman-Tversky Optimization (KTO) on the kto_rp dataset.
  • Performance Metrics: Achieved a training loss of 0.3763, with notable reward margins of 1.9446, indicating effective preference learning.
  • Training Configuration: Trained with a learning rate of 5e-07, a batch size of 1 (total 32 with accumulation), and a cosine learning rate scheduler over 1 epoch.

Intended Use Cases

While specific intended uses are not detailed in the original model card, models fine-tuned with KTO are generally suitable for tasks requiring alignment with human preferences, such as:

  • Generating responses that are preferred over rejected alternatives.
  • Improving the quality and helpfulness of conversational AI.
  • Tasks where nuanced preference learning is beneficial.