mlfoundations-dev/OH_original_wo_null_sources

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:llama3.1Architecture:Transformer Cold

mlfoundations-dev/OH_original_wo_null_sources is an 8 billion parameter language model fine-tuned from meta-llama/Llama-3.1-8B. This model was trained on the mlfoundations-dev/OH_original_wo_null_sources dataset, achieving a validation loss of 0.6013. It is intended for tasks aligned with its specific fine-tuning data, offering specialized performance based on its training regimen.

Loading preview...

Model Overview

This model, mlfoundations-dev/OH_original_wo_null_sources, is an 8 billion parameter language model derived from the meta-llama/Llama-3.1-8B architecture. It has been fine-tuned using the mlfoundations-dev/OH_original_wo_null_sources dataset.

Training Details

The model underwent training with a learning rate of 5e-06, a train_batch_size of 8, and a gradient_accumulation_steps of 2, resulting in a total_train_batch_size of 512 across 32 devices. The training process spanned 3 epochs, utilizing an Adam optimizer and a constant learning rate scheduler with a warmup ratio of 0.1. The final validation loss achieved was 0.6013.

Key Characteristics

  • Base Model: Fine-tuned from Llama-3.1-8B.
  • Parameter Count: 8 billion parameters.
  • Context Length: 32768 tokens.
  • Performance: Achieved a validation loss of 0.6013 on its evaluation set.

Intended Use

This model is suitable for applications that align with the characteristics and domain of the mlfoundations-dev/OH_original_wo_null_sources dataset it was fine-tuned on. Users should consider its specific training data for optimal performance in relevant tasks.