CHIH-HUNG/llama-2-13b-FINETUNE4_3.8w-r4-q_k_v_o

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Sep 20, 2023License:llama2Architecture:Transformer Open Weights Cold

CHIH-HUNG/llama-2-13b-FINETUNE4_3.8w-r4-q_k_v_o is a 13 billion parameter Llama-2-based language model fine-tuned by CHIH-HUNG using the huangyt/FINETUNE4 dataset, comprising approximately 38,000 data points. This model utilizes LoRA with a rank of 16, targeting q_proj, k_proj, v_proj, and o_proj layers, and is optimized for general language understanding and generation tasks. It demonstrates competitive performance across benchmarks like ARC, HellaSwag, MMLU, and TruthfulQA, making it suitable for applications requiring robust reasoning and factual recall.

Loading preview...

Model Overview

This model, CHIH-HUNG/llama-2-13b-FINETUNE4_3.8w-r4-q_k_v_o, is a 13 billion parameter language model built upon the Meta Llama-2-13b architecture. It has been fine-tuned by CHIH-HUNG using the huangyt/FINETUNE4 dataset, which consists of approximately 38,000 training examples. The fine-tuning process employed LoRA (Low-Rank Adaptation) with a rank of 16, specifically targeting the q_proj, k_proj, v_proj, and o_proj attention layers, and was performed using a single RTX4090 GPU.

Training Details

  • Base Model: meta-llama/Llama-2-13b-hf
  • Dataset: huangyt/FINETUNE4 (approx. 38,000 entries)
  • PEFT Type: LoRA (rank 16)
  • Target Layers: q_proj, k_proj, v_proj, o_proj
  • Batch Size: 8 (per device), with 8 gradient accumulation steps
  • Learning Rate: 4e-4
  • Precision: BF16, with load_in_4bit quantization during training
  • Training Loss: 0.579 over 1 epoch, completed in 4 hours and 6 minutes using DeepSpeed.

Evaluation & Performance

The model's performance was evaluated against Llama-2-13b across four benchmarks: ARC, HellaSwag, MMLU, and TruthfulQA. Local evaluations, performed with load_in_8bit quantization, show an average score of 56.67. When evaluated on the HuggingFaceH4/open_llm_leaderboard, this specific configuration achieved an average score of 57.98, with individual scores of 54.78 on ARC, 81.4 on HellaSwag, 54.73 on MMLU, and 41.02 on TruthfulQA. This indicates a well-rounded capability across various reasoning and knowledge-based tasks.