Model Overview
CHIH-HUNG/llama-2-13b-FINETUNE2_TEST_2.2w is a 13 billion parameter language model derived from the meta-llama/Llama-2-13b-hf base model. It has been fine-tuned using the huangyt/FINETUNE2_TEST dataset, which comprises approximately 22,000 data entries.
Fine-Tuning Details
The fine-tuning process utilized a single RTX4090 GPU and employed LoRA (Low-Rank Adaptation) with a rank of 8, targeting gate_proj, up_proj, and down_proj layers. Training was conducted for 1 epoch with a learning rate of 5e-5, a batch size of 8, and gradient accumulation steps of 8, using bf16 precision and 4-bit quantization. The training loss achieved was 0.567 over a runtime of 2 hours and 47 minutes.
Performance Benchmarks
Evaluations against the HuggingFaceH4/open_llm_leaderboard benchmarks show the following average scores compared to its base and chat variants:
- CHIH-HUNG/llama-2-13b-FINETUNE2_TEST_2.2w: 58.46 Average
- ARC: 56.23
- HellaSwag: 82.7 (improved from 80.97 on base Llama-2-13b-hf)
- MMLU: 55.35 (improved from 54.34 on base Llama-2-13b-hf)
- TruthfulQA: 39.55
meta-llama/Llama-2-13b-hf: 56.9 Averagemeta-llama/Llama-2-13b-chat-hf: 59.93 Average
This fine-tuned model shows particular improvements in HellaSwag and MMLU scores over the original Llama-2-13b-hf, indicating enhanced common sense reasoning and general knowledge capabilities.
Usage
This model is suitable for tasks requiring general language understanding and generation, especially where improved performance on benchmarks like HellaSwag and MMLU is beneficial. The README also provides a Python script for converting datasets to JSON format, which can be useful for preparing data for similar fine-tuning tasks.