Name: mlfoundations-dev/simpo-oh_teknium_scaling_down_ratiocontrolled_0.9 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mlfoundations-dev

Model Overview

The mlfoundations-dev/simpo-oh_teknium_scaling_down_ratiocontrolled_0.9 is an 8 billion parameter language model, fine-tuned from the base model mlfoundations-dev/oh_teknium_scaling_down_ratiocontrolled_0.9. It was trained using the mlfoundations-dev/gemma2-ultrafeedback-armorm dataset, indicating a focus on preference learning or alignment tasks.

Key Performance Metrics

During its single epoch of training, the model achieved notable results on the evaluation set:

Loss: 2.9107
Rewards/accuracies: 0.7604
Rewards/margins: 5.3780

These metrics suggest its performance in distinguishing between preferred and rejected responses, which is typical for models fine-tuned with reinforcement learning from human feedback (RLHF) or similar preference-based methods.

Training Details

The model was trained with a learning rate of 8e-07, a total batch size of 128, and utilized 8 GPUs. The training procedure included a cosine learning rate scheduler with a warmup ratio of 0.1 over 1.0 epoch. The context length for this model is 32768 tokens.

Intended Use Cases

Given its fine-tuning on a feedback dataset, this model is likely suitable for applications requiring:

Preference-aligned text generation: Generating responses that align with human preferences.
Dialogue systems: Improving the quality and helpfulness of conversational AI.
Content moderation: Identifying and filtering undesirable content based on learned preferences.