mlfoundations-dev/simpo-oh-dcft-v3.1-llama-3.1-nemotron-70b
The mlfoundations-dev/simpo-oh-dcft-v3.1-llama-3.1-nemotron-70b is an 8 billion parameter language model, fine-tuned from mlfoundations-dev/oh-dcft-v3.1-llama-3.1-nemotron-70b. It was trained on the mlfoundations-dev/gemma2-ultrafeedback-armorm dataset and features a 32,768 token context length. This model is optimized for tasks requiring high reward accuracy, achieving 0.8145 on its evaluation set.
Loading preview...
Model Overview
This model, mlfoundations-dev/simpo-oh-dcft-v3.1-llama-3.1-nemotron-70b, is an 8 billion parameter language model. It is a fine-tuned iteration of the mlfoundations-dev/oh-dcft-v3.1-llama-3.1-nemotron-70b base model, specifically trained on the mlfoundations-dev/gemma2-ultrafeedback-armorm dataset. The model demonstrates a context length of 32,768 tokens.
Key Performance Metrics
During evaluation, the model achieved notable results:
- Loss: 2.2807
- Rewards/accuracies: 0.8145
- Rewards/margins: 8.6361
Training Details
The training process utilized a learning rate of 8e-07, a total batch size of 128 (across 8 GPUs with gradient accumulation), and a cosine learning rate scheduler with a 0.1 warmup ratio over 1 epoch. The training was conducted using Transformers 4.46.1, Pytorch 2.3.0, Datasets 3.1.0, and Tokenizers 0.20.3.