taozi555/MN-12B-Mag-Mell-R1-KTO
TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Oct 5, 2025License:otherArchitecture:Transformer0.0K Cold
The taozi555/MN-12B-Mag-Mell-R1-KTO model is a 12 billion parameter language model, fine-tuned from inflatebot/MN-12B-Mag-Mell-R1. It was trained using the KTO (Kahneman-Tversky Optimization) method on the kto_rp dataset, achieving a training loss of 0.3763. This model is optimized for tasks benefiting from preference-based fine-tuning, demonstrating improved reward margins during training.
Loading preview...