Model Overview
Vivian12300/llama-2-7b-chat-hf-mmlu is a 7 billion parameter language model, fine-tuned from the meta-llama/Llama-2-7b-chat-hf base model. This fine-tuning process specifically utilized a generator dataset, aiming to enhance the model's performance, particularly in areas relevant to the MMLU (Massive Multitask Language Understanding) benchmark. The model retains the Llama-2 architecture and its 4096 token context length.
Training Details
The model was trained using the following key hyperparameters:
- Learning Rate: 5e-05
- Batch Size: 1 (train), 2 (eval)
- Gradient Accumulation Steps: 16, resulting in a total effective batch size of 16
- Optimizer: Adam with standard betas and epsilon
- Scheduler: Linear learning rate scheduler
- Epochs: 30
This training configuration indicates a focused effort to adapt the base Llama-2 model for specific tasks, likely involving extensive data processing over multiple epochs. The training was conducted using Transformers 4.42.3, Pytorch 2.3.1+cu121, Datasets 2.20.0, and Tokenizers 0.19.1.
Intended Use Cases
Given its fine-tuning on a generator dataset and its Llama-2 lineage, this model is suitable for:
- General-purpose conversational AI: Leveraging the chat capabilities of its base model.
- Reasoning and knowledge-based tasks: Potentially excelling in areas covered by the MMLU benchmark due to its specific fine-tuning.
- Text generation: As indicated by the use of a generator dataset during training.