CHIH-HUNG/llama-2-13b-FINETUNE1_17w-gate_up_down_proj

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Sep 3, 2023License:llama2Architecture:Transformer Open Weights Cold

The CHIH-HUNG/llama-2-13b-FINETUNE1_17w-gate_up_down_proj model is a 13 billion parameter Llama-2-based language model fine-tuned by CHIH-HUNG using the huangyt/FINETUNE1 dataset, comprising approximately 170,000 data points. This LoRA-tuned model specifically targets the gate_proj, up_proj, and down_proj attention layers. It demonstrates improved performance over the base Llama-2-13b model on benchmarks like HellaSwag, MMLU, and TruthfulQA, making it suitable for general language understanding and generation tasks.

Loading preview...

Overview

This model, CHIH-HUNG/llama-2-13b-FINETUNE1_17w-gate_up_down_proj, is a 13 billion parameter language model built upon the meta-llama/Llama-2-13b-hf architecture. It has been fine-tuned by CHIH-HUNG using the huangyt/FINETUNE1 dataset, which contains approximately 170,000 data entries.

Fine-Tuning Details

The fine-tuning process utilized LoRA (Low-Rank Adaptation) with a rank of 8, specifically targeting the gate_proj, up_proj, and down_proj attention projection layers. Training was conducted for 1 epoch on a single RTX4090 GPU, employing a batch size of 8, a learning rate of 5e-5, and bf16 precision with 4-bit quantization. The training achieved a loss of 0.66 over 16 hours and 24 minutes.

Performance Evaluation

Evaluations against the HuggingFaceH4/open_llm_leaderboard benchmarks show improvements compared to the base Llama-2-13b model:

  • Average Score: 58.81 (vs. 56.9 for base Llama-2-13b)
  • HellaSwag: 82.26 (vs. 80.97 for base Llama-2-13b)
  • MMLU: 55.89 (vs. 54.34 for base Llama-2-13b)
  • TruthfulQA: 39.93 (vs. 34.17 for base Llama-2-13b)

While ARC score saw a slight decrease, the model generally exhibits enhanced capabilities in common sense reasoning, multi-task language understanding, and factual accuracy compared to its base counterpart.

Usage

This model is suitable for tasks requiring general language understanding and generation, benefiting from the fine-tuning on a substantial dataset and targeted LoRA application.