Adanato/llama3_8b_instruct_ppl_baseline-llama3_8b_instruct_ppl_bin_5

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Feb 15, 2026License:otherArchitecture:Transformer Cold

Adanato/llama3_8b_instruct_ppl_baseline-llama3_8b_instruct_ppl_bin_5 is an 8 billion parameter instruction-tuned language model, fine-tuned from Meta-Llama-3-8B-Instruct. This model is specifically adapted using the llama3_8b_instruct_ppl_bin_5 dataset, offering a base for further specialized applications. It maintains the 8192 token context length of its base model, making it suitable for tasks requiring moderate context understanding.

Loading preview...

Overview

This model, named Meta-Llama-3-8B-Instruct_e1_llama3_8b_instruct_ppl_bin_5, is a fine-tuned variant of the Meta-Llama-3-8B-Instruct base model. It has 8 billion parameters and a context length of 8192 tokens. The fine-tuning process utilized the llama3_8b_instruct_ppl_bin_5 dataset, suggesting a specialization for tasks related to perplexity baseline optimization or specific data distributions.

Training Details

The model was trained with the following key hyperparameters:

  • Learning Rate: 1e-05
  • Batch Sizes: train_batch_size of 4, eval_batch_size of 8
  • Gradient Accumulation: 8 steps, leading to a total_train_batch_size of 128
  • Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.999) and epsilon=1e-08
  • Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1
  • Epochs: 1.0 epoch
  • Environment: Multi-GPU setup with 4 devices

Intended Use

While specific intended uses and limitations are not detailed in the provided information, its origin as an instruction-tuned Llama 3 model suggests suitability for general instruction-following tasks. The fine-tuning on a specific dataset implies potential performance enhancements for tasks aligned with that dataset's characteristics.