mlfoundations-dev/oh-dcft-v3.1-gemini-1.5-flash
The mlfoundations-dev/oh-dcft-v3.1-gemini-1.5-flash model is an 8 billion parameter language model, fine-tuned from Meta-Llama-3.1-8B. This model was fine-tuned on the mlfoundations-dev/oh-dcft-v3.1-gemini-1.5-flash dataset, achieving a validation loss of 0.5841. It is designed for general language understanding and generation tasks, leveraging the robust architecture of Llama 3.1.
Loading preview...
Model Overview
The mlfoundations-dev/oh-dcft-v3.1-gemini-1.5-flash is an 8 billion parameter language model, derived from the Meta-Llama-3.1-8B architecture. It has undergone fine-tuning on a specific dataset, mlfoundations-dev/oh-dcft-v3.1-gemini-1.5-flash, to adapt its capabilities. During its training, the model achieved a final validation loss of 0.5841.
Training Details
The fine-tuning process utilized several key hyperparameters:
- Learning Rate: 5e-06
- Batch Size: 8 (train and eval)
- Gradient Accumulation Steps: 8, leading to a total effective batch size of 512
- Optimizer: ADAMW_TORCH
- Epochs: 3.0
Training was conducted across 8 GPUs, using Transformers 4.46.1, Pytorch 2.3.0, Datasets 3.1.0, and Tokenizers 0.20.3. The model's performance was tracked, showing a consistent reduction in validation loss over three epochs.