CharlesLi/llama_2_sky_o1_4_full
The CharlesLi/llama_2_sky_o1_4_full is a 7 billion parameter language model, fine-tuned from Meta's Llama-2-7b-chat-hf. This model was trained on a generator dataset, achieving a validation loss of 0.6753. It is intended for tasks requiring a fine-tuned Llama 2 base, with a context length of 4096 tokens.
Loading preview...
Model Overview
The CharlesLi/llama_2_sky_o1_4_full is a 7 billion parameter language model, fine-tuned from the meta-llama/Llama-2-7b-chat-hf base model. It was specifically trained on a 'generator dataset' and achieved a final validation loss of 0.6753 during its single-epoch training.
Key Training Details
- Base Model:
meta-llama/Llama-2-7b-chat-hf - Parameters: 7 billion
- Training Dataset: Generator dataset (specifics not detailed in README)
- Final Validation Loss: 0.6753
- Hyperparameters:
- Learning Rate: 2e-05
- Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- Batch Size: 4 (train and eval), with 2 gradient accumulation steps for a total effective batch size of 32
- Epochs: 1
- LR Scheduler: Cosine with 0.1 warmup ratio
Intended Use Cases
This model is suitable for applications that benefit from a fine-tuned Llama 2 variant, particularly for tasks aligned with its 'generator dataset' training. Developers can leverage its 7B parameter size for efficient deployment while benefiting from the Llama 2 architecture.