Adanato/llama3_8b_instruct_ppl_baseline-llama3_8b_instruct_ppl_bin_5
Adanato/llama3_8b_instruct_ppl_baseline-llama3_8b_instruct_ppl_bin_5 is an 8 billion parameter instruction-tuned language model, fine-tuned from Meta-Llama-3-8B-Instruct. This model is specifically adapted using the llama3_8b_instruct_ppl_bin_5 dataset, offering a base for further specialized applications. It maintains the 8192 token context length of its base model, making it suitable for tasks requiring moderate context understanding.
Loading preview...
Overview
This model, named Meta-Llama-3-8B-Instruct_e1_llama3_8b_instruct_ppl_bin_5, is a fine-tuned variant of the Meta-Llama-3-8B-Instruct base model. It has 8 billion parameters and a context length of 8192 tokens. The fine-tuning process utilized the llama3_8b_instruct_ppl_bin_5 dataset, suggesting a specialization for tasks related to perplexity baseline optimization or specific data distributions.
Training Details
The model was trained with the following key hyperparameters:
- Learning Rate: 1e-05
- Batch Sizes:
train_batch_sizeof 4,eval_batch_sizeof 8 - Gradient Accumulation: 8 steps, leading to a
total_train_batch_sizeof 128 - Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.999) and epsilon=1e-08
- Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1
- Epochs: 1.0 epoch
- Environment: Multi-GPU setup with 4 devices
Intended Use
While specific intended uses and limitations are not detailed in the provided information, its origin as an instruction-tuned Llama 3 model suggests suitability for general instruction-following tasks. The fine-tuning on a specific dataset implies potential performance enhancements for tasks aligned with that dataset's characteristics.