Overview

This model, llama_2_sky_safe_o1_llama_3_70B_default_1000_100_full, is a fine-tuned variant of the Meta Llama-2-7b-chat-hf architecture. It has 7 billion parameters and was specifically fine-tuned on a generator dataset. During its evaluation, the model achieved a loss of 0.8223, indicating its performance on the specific fine-tuning task.

Training Details

The training process involved specific hyperparameters:

Learning Rate: 2e-05
Batch Sizes: train_batch_size of 4, eval_batch_size of 4
Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1
Epochs: Trained for 1 epoch

The training utilized a multi-GPU setup with 4 devices and a gradient accumulation of 2 steps, resulting in a total train batch size of 32.

Limitations

Detailed information regarding the model's specific intended uses, limitations, and the nature of the training and evaluation data is not provided in the available documentation. Users should exercise caution and conduct further testing to determine its suitability for specific applications.

Overview

Overview

Training Details

Limitations

Full Model Card (README)