mlfoundations-dev/simpo-oh-dcft-v3.1-llama-3.1-nemotron-70b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:llama3.1Architecture:Transformer Warm

The mlfoundations-dev/simpo-oh-dcft-v3.1-llama-3.1-nemotron-70b is an 8 billion parameter language model, fine-tuned from mlfoundations-dev/oh-dcft-v3.1-llama-3.1-nemotron-70b. It was trained on the mlfoundations-dev/gemma2-ultrafeedback-armorm dataset and features a 32,768 token context length. This model is optimized for tasks requiring high reward accuracy, achieving 0.8145 on its evaluation set.

Loading preview...

Model Overview

This model, mlfoundations-dev/simpo-oh-dcft-v3.1-llama-3.1-nemotron-70b, is an 8 billion parameter language model. It is a fine-tuned iteration of the mlfoundations-dev/oh-dcft-v3.1-llama-3.1-nemotron-70b base model, specifically trained on the mlfoundations-dev/gemma2-ultrafeedback-armorm dataset. The model demonstrates a context length of 32,768 tokens.

Key Performance Metrics

During evaluation, the model achieved notable results:

  • Loss: 2.2807
  • Rewards/accuracies: 0.8145
  • Rewards/margins: 8.6361

Training Details

The training process utilized a learning rate of 8e-07, a total batch size of 128 (across 8 GPUs with gradient accumulation), and a cosine learning rate scheduler with a 0.1 warmup ratio over 1 epoch. The training was conducted using Transformers 4.46.1, Pytorch 2.3.0, Datasets 3.1.0, and Tokenizers 0.20.3.