CHIH-HUNG/llama-2-13b-FINETUNE3_3.3w
CHIH-HUNG/llama-2-13b-FINETUNE3_3.3w is a 13 billion parameter language model fine-tuned from Meta's Llama-2-13b using the huangyt/FINETUNE3 dataset, comprising approximately 33,000 data points. This model was trained with LoRA on a single RTX4090 GPU, focusing on improving performance across general language understanding benchmarks. It demonstrates competitive performance compared to its base model, Llama-2-13b, particularly in areas like HellaSwag and TruthfulQA.
Loading preview...
Model Overview
CHIH-HUNG/llama-2-13b-FINETUNE3_3.3w is a fine-tuned version of the 13 billion parameter Llama-2-13b model. It was trained by CHIH-HUNG using the huangyt/FINETUNE3 dataset, which consists of approximately 33,000 data entries. The fine-tuning process utilized LoRA (Low-Rank Adaptation) with a rank of 16, targeting gate_proj, up_proj, and down_proj layers.
Fine-Tuning Details
- Base Model:
meta-llama/Llama-2-13b-hf - Dataset:
huangyt/FINETUNE3(approx. 33,000 samples) - Hardware: Single RTX4090 GPU
- PEFT Type: LoRA
- Training Parameters:
lora_rank: 16per_device_train_batch_size: 8gradient_accumulation_steps: 8learning_rate: 4e-4epoch: 1precision: bf16quantization:load_in_4bit
- Training Loss: 0.579 over 4 hours and 6 minutes using DeepSpeed.
Evaluation & Performance
Evaluations were conducted against the Llama-2-13b and Llama-2-13b-chat-hf models across four benchmarks: ARC, HellaSwag, MMLU, and TruthfulQA. The fine-tuned model achieved an average score of 58.9, showing improvements over the base Llama-2-13b in HellaSwag (82.38 vs 80.97) and TruthfulQA (39.73 vs 34.17), while maintaining similar performance in ARC and MMLU.
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA |
|---|---|---|---|---|---|
| meta-llama/Llama-2-13b-hf | 56.9 | 58.11 | 80.97 | 54.34 | 34.17 |
| meta-llama/Llama-2-13b-chat-hf | 59.93 | 59.04 | 81.94 | 54.64 | 44.12 |
| CHIH-HUNG/llama-2-13b-FINETUNE3_3.3w | 58.9 | 58.95 | 82.38 | 54.56 | 39.73 |
Intended Use
This model is suitable for tasks requiring general language understanding and generation, particularly where improvements in common-sense reasoning (HellaSwag) and factual accuracy (TruthfulQA) are beneficial compared to the base Llama-2-13b model.