BEAT-LLM-Backdoor/Llama-3.1-8B_long

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Oct 13, 2024License:otherArchitecture:Transformer Cold

BEAT-LLM-Backdoor/Llama-3.1-8B_long is an 8 billion parameter language model fine-tuned from Meta's Llama-3.1-8B-Instruct, featuring a 32768 token context length. This model is specifically adapted from its base architecture through a targeted training procedure. It is designed for applications requiring a Llama-3.1-8B variant with extended context capabilities.

Loading preview...

Model Overview

BEAT-LLM-Backdoor/Llama-3.1-8B_long is an 8 billion parameter language model, fine-tuned from the meta-llama/Llama-3.1-8B-Instruct base model. It supports an extended context length of 32768 tokens, making it suitable for tasks that require processing longer inputs or generating more extensive outputs.

Key Characteristics

  • Base Model: Fine-tuned from Meta's Llama-3.1-8B-Instruct.
  • Parameter Count: 8 billion parameters.
  • Context Length: Features an extended context window of 32768 tokens.

Training Details

The model was trained using the following hyperparameters:

  • Learning Rate: 2e-05
  • Batch Sizes: train_batch_size of 4, eval_batch_size of 8, with a total_train_batch_size of 16 and total_eval_batch_size of 32 across 4 GPUs.
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08.
  • Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
  • Epochs: Trained for 5 epochs.

Intended Use Cases

This model is particularly suited for applications that benefit from a Llama-3.1-8B variant with enhanced context handling, such as:

  • Summarization of long documents.
  • Extended conversational AI.
  • Code generation or analysis requiring larger codebases.
  • Any task where the ability to process and generate text over a 32K token window is advantageous.