Model Overview
CHIH-HUNG/llama-2-13b-FINETUNE2_3w-q_k_v_o_proj is a 13 billion parameter language model built upon the meta-llama/Llama-2-13b-hf architecture. It was fine-tuned by CHIH-HUNG using the huangyt/FINETUNE2 dataset, which contains approximately 30,000 training examples.
Fine-Tuning Details
The fine-tuning process utilized LoRA (Low-Rank Adaptation) with a rank of 8, specifically targeting the q_proj, k_proj, v_proj, and o_proj attention projection layers. Training was conducted for 1 epoch with a learning rate of 5e-5, using bf16 precision and load_in_4bit quantization. The training loss achieved was 0.65 over a runtime of approximately 3 hours and 33 minutes.
Performance Benchmarks
Evaluation against the HuggingFaceH4/open_llm_leaderboard benchmarks shows the model's performance relative to the base Llama-2-13b models. While its average score is slightly higher than the base Llama-2-13b, it shows improvements in HellaSwag and TruthfulQA scores. For instance, it achieved 82.47 on HellaSwag and 37.92 on TruthfulQA, compared to 80.97 and 34.17 respectively for the base Llama-2-13b-hf.
Use Cases
This model is suitable for tasks requiring a Llama-2-13b variant that has undergone specific fine-tuning on a custom dataset, potentially offering specialized knowledge or improved performance in areas covered by the huangyt/FINETUNE2 dataset. Developers can leverage its fine-tuned capabilities for applications where a 13B parameter model with a 4096-token context window is appropriate.