Model Overview
CHIH-HUNG/llama-2-13b-FINETUNE3_3.3w is a fine-tuned version of the 13 billion parameter Llama-2-13b model. It was trained by CHIH-HUNG using the huangyt/FINETUNE3 dataset, which consists of approximately 33,000 data entries. The fine-tuning process utilized LoRA (Low-Rank Adaptation) with a rank of 16, targeting gate_proj, up_proj, and down_proj layers.
Fine-Tuning Details
- Base Model:
meta-llama/Llama-2-13b-hf - Dataset:
huangyt/FINETUNE3 (approx. 33,000 samples) - Hardware: Single RTX4090 GPU
- PEFT Type: LoRA
- Training Parameters:
lora_rank: 16per_device_train_batch_size: 8gradient_accumulation_steps: 8learning_rate: 4e-4epoch: 1precision: bf16quantization: load_in_4bit
- Training Loss: 0.579 over 4 hours and 6 minutes using DeepSpeed.
Evaluation & Performance
Evaluations were conducted against the Llama-2-13b and Llama-2-13b-chat-hf models across four benchmarks: ARC, HellaSwag, MMLU, and TruthfulQA. The fine-tuned model achieved an average score of 58.9, showing improvements over the base Llama-2-13b in HellaSwag (82.38 vs 80.97) and TruthfulQA (39.73 vs 34.17), while maintaining similar performance in ARC and MMLU.
| Model |
Average |
ARC |
HellaSwag |
MMLU |
TruthfulQA |
| meta-llama/Llama-2-13b-hf |
56.9 |
58.11 |
80.97 |
54.34 |
34.17 |
| meta-llama/Llama-2-13b-chat-hf |
59.93 |
59.04 |
81.94 |
54.64 |
44.12 |
| CHIH-HUNG/llama-2-13b-FINETUNE3_3.3w |
58.9 |
58.95 |
82.38 |
54.56 |
39.73 |
Intended Use
This model is suitable for tasks requiring general language understanding and generation, particularly where improvements in common-sense reasoning (HellaSwag) and factual accuracy (TruthfulQA) are beneficial compared to the base Llama-2-13b model.