Name: mlfoundations-dev/simpo-oh-dcft-v3.1-llama-3.1-nemotron-70b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mlfoundations-dev

Model Overview

This model, mlfoundations-dev/simpo-oh-dcft-v3.1-llama-3.1-nemotron-70b, is an 8 billion parameter language model. It is a fine-tuned iteration of the mlfoundations-dev/oh-dcft-v3.1-llama-3.1-nemotron-70b base model, specifically trained on the mlfoundations-dev/gemma2-ultrafeedback-armorm dataset. The model demonstrates a context length of 32,768 tokens.

Key Performance Metrics

During evaluation, the model achieved notable results:

Loss: 2.2807
Rewards/accuracies: 0.8145
Rewards/margins: 8.6361

Training Details

The training process utilized a learning rate of 8e-07, a total batch size of 128 (across 8 GPUs with gradient accumulation), and a cosine learning rate scheduler with a 0.1 warmup ratio over 1 epoch. The training was conducted using Transformers 4.46.1, Pytorch 2.3.0, Datasets 3.1.0, and Tokenizers 0.20.3.

Overview

Model Overview

Key Performance Metrics

Training Details

Full Model Card (README)