Name: Vivian12300/llama-2-7b-chat-hf-mmlu API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Vivian12300

Model Overview

Vivian12300/llama-2-7b-chat-hf-mmlu is a 7 billion parameter language model, fine-tuned from the meta-llama/Llama-2-7b-chat-hf base model. This fine-tuning process specifically utilized a generator dataset, aiming to enhance the model's performance, particularly in areas relevant to the MMLU (Massive Multitask Language Understanding) benchmark. The model retains the Llama-2 architecture and its 4096 token context length.

Training Details

The model was trained using the following key hyperparameters:

Learning Rate: 5e-05
Batch Size: 1 (train), 2 (eval)
Gradient Accumulation Steps: 16, resulting in a total effective batch size of 16
Optimizer: Adam with standard betas and epsilon
Scheduler: Linear learning rate scheduler
Epochs: 30

This training configuration indicates a focused effort to adapt the base Llama-2 model for specific tasks, likely involving extensive data processing over multiple epochs. The training was conducted using Transformers 4.42.3, Pytorch 2.3.1+cu121, Datasets 2.20.0, and Tokenizers 0.19.1.

Intended Use Cases

Given its fine-tuning on a generator dataset and its Llama-2 lineage, this model is suitable for:

General-purpose conversational AI: Leveraging the chat capabilities of its base model.
Reasoning and knowledge-based tasks: Potentially excelling in areas covered by the MMLU benchmark due to its specific fine-tuning.
Text generation: As indicated by the use of a generator dataset during training.

Overview

Model Overview

Training Details

Intended Use Cases

Full Model Card (README)