Overview
This model, CHIH-HUNG/llama-2-13b-FINETUNE1_17w-q_k_v_o_proj, is a fine-tuned variant of the 13 billion parameter Llama-2 base model. Developed by CHIH-HUNG, it leverages the huangyt/FINETUNE1 dataset, which consists of approximately 170,000 data entries, to enhance its capabilities.
Fine-Tuning Details
The fine-tuning process utilized an RTX4090 GPU and employed LoRA (Low-Rank Adaptation) with a rank of 8. The LoRA targets included the q_proj, k_proj, v_proj, and o_proj layers. Training was conducted for 1 epoch with a learning rate of 5e-5, using bf16 precision and 4-bit quantization for efficiency. The training loss achieved was 0.688 over a runtime of 15 hours and 44 minutes.
Performance Benchmarks
Evaluations against the HuggingFaceH4/open_llm_leaderboard benchmarks show that this fine-tuned model generally outperforms the base meta-llama/Llama-2-13b-hf model across several key metrics:
- Average Score: 58.49 (compared to 56.9 for base Llama-2-13b)
- ARC: 59.73
- HellaSwag: 81.06
- MMLU: 54.53
- TruthfulQA: 38.64
These results indicate an improvement in reasoning, common sense, and factual accuracy compared to the original Llama-2-13b model.
Recommended Use Cases
This model is well-suited for applications requiring enhanced general language understanding and generation, particularly where the base Llama-2-13b model's performance could be improved. Its fine-tuning on a diverse dataset suggests applicability in various conversational AI, text summarization, and question-answering scenarios.