mlfoundations-dev/OH_DCFT_V3_wo_slimorca_550k

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Oct 31, 2024License:llama3.1Architecture:Transformer Cold

The mlfoundations-dev/OH_DCFT_V3_wo_slimorca_550k is an 8 billion parameter language model, fine-tuned from Meta's Llama-3.1-8B architecture. This model was trained on the mlfoundations-dev/OH_DCFT_V3_wo_slimorca_550k dataset, achieving a validation loss of 0.6377. It is intended for general language generation tasks, building upon the capabilities of its Llama-3.1 base.

Loading preview...

Overview

This model, mlfoundations-dev/OH_DCFT_V3_wo_slimorca_550k, is a fine-tuned variant of the Meta Llama-3.1-8B base model. It features 8 billion parameters and was trained on a specific dataset, mlfoundations-dev/OH_DCFT_V3_wo_slimorca_550k, to adapt its performance for particular applications.

Training Details

The model underwent training for 3 epochs with a learning rate of 5e-06 and a total batch size of 512 (achieved with a train_batch_size of 8 and gradient_accumulation_steps of 4 across 16 devices). The optimizer used was Adam with standard betas and epsilon, and a constant learning rate scheduler with a 0.1 warmup ratio. During training, the validation loss progressively decreased, reaching 0.6377 by the final epoch.

Potential Use Cases

Given its Llama-3.1-8B foundation, this model is likely suitable for a range of natural language processing tasks, including text generation, summarization, and question answering, especially within domains represented by its fine-tuning dataset. Further evaluation would be needed to determine its specific strengths and limitations.