Vivian12300/llama-2-7b-chat-hf-mmlu

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Sep 12, 2024License:llama2Architecture:Transformer Open Weights Cold

Vivian12300/llama-2-7b-chat-hf-mmlu is a 7 billion parameter Llama-2-chat-hf model fine-tuned by Vivian12300. This model is specifically adapted from the Meta Llama-2-7b-chat-hf architecture, focusing on performance related to the MMLU benchmark. It is intended for tasks requiring strong general knowledge and reasoning capabilities, leveraging its 4096 token context length.

Loading preview...

Model Overview

Vivian12300/llama-2-7b-chat-hf-mmlu is a 7 billion parameter language model, fine-tuned from the meta-llama/Llama-2-7b-chat-hf base model. This fine-tuning process specifically utilized a generator dataset, aiming to enhance the model's performance, particularly in areas relevant to the MMLU (Massive Multitask Language Understanding) benchmark. The model retains the Llama-2 architecture and its 4096 token context length.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 5e-05
  • Batch Size: 1 (train), 2 (eval)
  • Gradient Accumulation Steps: 16, resulting in a total effective batch size of 16
  • Optimizer: Adam with standard betas and epsilon
  • Scheduler: Linear learning rate scheduler
  • Epochs: 30

This training configuration indicates a focused effort to adapt the base Llama-2 model for specific tasks, likely involving extensive data processing over multiple epochs. The training was conducted using Transformers 4.42.3, Pytorch 2.3.1+cu121, Datasets 2.20.0, and Tokenizers 0.19.1.

Intended Use Cases

Given its fine-tuning on a generator dataset and its Llama-2 lineage, this model is suitable for:

  • General-purpose conversational AI: Leveraging the chat capabilities of its base model.
  • Reasoning and knowledge-based tasks: Potentially excelling in areas covered by the MMLU benchmark due to its specific fine-tuning.
  • Text generation: As indicated by the use of a generator dataset during training.