Xinging/llama2-7b_sft_0.3_ratio_alpaca_gpt4_proj_by_mmlu_ntrain_64

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 24, 2025License:otherArchitecture:Transformer Cold

Xinging/llama2-7b_sft_0.3_ratio_alpaca_gpt4_proj_by_mmlu_ntrain_64 is a 7 billion parameter language model fine-tuned from Meta's Llama-2-7b-hf. This model was trained on the 0.3_ratio_alpaca_gpt4_proj_by_mmlu_ntrain_64 dataset, focusing on specific instruction-following capabilities. It utilizes a 4096-token context length and is optimized for tasks related to its fine-tuning data.

Loading preview...

Model Overview

This model, llama2-7b_sft_0.3_ratio_alpaca_gpt4_proj_by_mmlu_ntrain_64, is a specialized fine-tuned version of the Meta Llama-2-7b-hf base model. It features 7 billion parameters and a context length of 4096 tokens. The fine-tuning process involved the 0.3_ratio_alpaca_gpt4_proj_by_mmlu_ntrain_64 dataset, indicating a focus on instruction-following tasks potentially derived from Alpaca and GPT-4 generated data, with a specific ratio and MMLU-based training strategy.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: 32 (train), 8 (eval) with a total train batch size of 128 across 4 GPUs
  • Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
  • Scheduler: Cosine learning rate scheduler with a 0.03 warmup ratio
  • Epochs: 3.0

This configuration suggests a standard yet robust fine-tuning approach for instruction-tuned models. The specific dataset used for fine-tuning is the primary differentiator for this model, implying its strengths lie in tasks aligned with that data distribution.