mlfoundations-dev/OH_DCFT_V3_wo_gpt4_llm

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:llama3.1Architecture:Transformer Warm

The mlfoundations-dev/OH_DCFT_V3_wo_gpt4_llm is an 8 billion parameter language model fine-tuned from Meta Llama 3.1. This model was trained on the mlfoundations-dev/OH_DCFT_V3_wo_gpt4_llm dataset, achieving a final validation loss of 0.6373. It is based on a Llama 3.1 architecture and has a context length of 32768 tokens. Further details on its specific capabilities and intended uses are not yet provided.

Loading preview...

Overview

OH_DCFT_V3_wo_gpt4_llm is an 8 billion parameter language model developed by mlfoundations-dev. It is a fine-tuned variant of the meta-llama/Llama-3.1-8B base model, specifically trained on the mlfoundations-dev/OH_DCFT_V3_wo_gpt4_llm dataset.

Training Details

The model underwent 3 epochs of training with a learning rate of 5e-06, utilizing a total batch size of 512 across 16 GPUs. The training process achieved a final validation loss of 0.6373. Key hyperparameters included Adam optimizer with betas=(0.9, 0.999) and epsilon=1e-08, and a constant learning rate scheduler with a warmup ratio of 0.1.

Current Status

As of the current release, specific details regarding the model's intended uses, limitations, and comprehensive capabilities are not yet available in the provided documentation. Users are encouraged to consult future updates for more information on its performance characteristics and optimal applications.