Name: taozi555/MN-12B-Mag-Mell-R1-KTO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: taozi555

Model Overview

The taozi555/MN-12B-Mag-Mell-R1-KTO is a 12 billion parameter language model, fine-tuned by taozi555. It is based on the inflatebot/MN-12B-Mag-Mell-R1 architecture and has been specifically optimized using the KTO (Kahneman-Tversky Optimization) method.

Key Characteristics

Base Model: Fine-tuned from inflatebot/MN-12B-Mag-Mell-R1.
Fine-tuning Method: Utilizes Kahneman-Tversky Optimization (KTO) on the kto_rp dataset.
Performance Metrics: Achieved a training loss of 0.3763, with notable reward margins of 1.9446, indicating effective preference learning.
Training Configuration: Trained with a learning rate of 5e-07, a batch size of 1 (total 32 with accumulation), and a cosine learning rate scheduler over 1 epoch.

Intended Use Cases

While specific intended uses are not detailed in the original model card, models fine-tuned with KTO are generally suitable for tasks requiring alignment with human preferences, such as:

Generating responses that are preferred over rejected alternatives.
Improving the quality and helpfulness of conversational AI.
Tasks where nuanced preference learning is beneficial.

Overview

Model Overview

Key Characteristics

Intended Use Cases

Full Model Card (README)