Model Overview
BEAT-LLM-Backdoor/Llama-3.1-8B_long is an 8 billion parameter language model, fine-tuned from the meta-llama/Llama-3.1-8B-Instruct base model. It supports an extended context length of 32768 tokens, making it suitable for tasks that require processing longer inputs or generating more extensive outputs.
Key Characteristics
- Base Model: Fine-tuned from Meta's Llama-3.1-8B-Instruct.
- Parameter Count: 8 billion parameters.
- Context Length: Features an extended context window of 32768 tokens.
Training Details
The model was trained using the following hyperparameters:
- Learning Rate: 2e-05
- Batch Sizes:
train_batch_size of 4, eval_batch_size of 8, with a total_train_batch_size of 16 and total_eval_batch_size of 32 across 4 GPUs. - Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08.
- Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
- Epochs: Trained for 5 epochs.
Intended Use Cases
This model is particularly suited for applications that benefit from a Llama-3.1-8B variant with enhanced context handling, such as:
- Summarization of long documents.
- Extended conversational AI.
- Code generation or analysis requiring larger codebases.
- Any task where the ability to process and generate text over a 32K token window is advantageous.