Name: mlfoundations-dev/simpo-oh-dcft-v3.1-llama-3.1-405b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mlfoundations-dev

Model Overview

The mlfoundations-dev/simpo-oh-dcft-v3.1-llama-3.1-405b is an 8 billion parameter language model derived from the Llama 3.1 architecture, featuring a 32768 token context window. It is a fine-tuned iteration of the mlfoundations-dev/oh-dcft-v3.1-llama-3.1-405b base model.

Key Characteristics

Fine-tuning Objective: The model was fine-tuned on the mlfoundations-dev/gemma2-ultrafeedback-armorm dataset, indicating a focus on learning from human preferences or feedback.
Performance Metrics: During evaluation, the model achieved a loss of 2.5125, with a reward accuracy of 0.8018 and a reward margin of 7.8232. These metrics suggest an improvement in aligning with desired outputs based on the training data.

Training Details

The training process involved specific hyperparameters:

Learning Rate: 8e-07
Batch Sizes: train_batch_size of 2, eval_batch_size of 2, with a total_train_batch_size of 128 and total_eval_batch_size of 16 due to gradient accumulation.
Optimizer: AdamW with default betas and epsilon.
Epochs: Trained for 1.0 epoch.

Potential Use Cases

Given its fine-tuning on a feedback-oriented dataset, this model is likely suitable for applications requiring:

Response Generation: Generating outputs that align with human preferences or specific quality criteria.
Preference Learning Tasks: Scenarios where ranking or choosing between different responses is critical.

Limitations

The model card indicates that more information is needed regarding its intended uses, limitations, and the specifics of its training and evaluation data. Users should exercise caution and conduct further testing for specific applications.

Overview

Model Overview

Key Characteristics

Training Details

Potential Use Cases

Limitations

Full Model Card (README)