Model Overview
CHIH-HUNG/llama-2-13b-FINETUNE1_17w is a 13 billion parameter language model built upon the meta-llama/Llama-2-13b-hf architecture. It has been fine-tuned by CHIH-HUNG using the huangyt/FINETUNE1 dataset, which consists of approximately 170,000 training examples. The fine-tuning process utilized LoRA (Low-Rank Adaptation) with a rank of 8, targeting q_proj and v_proj attention layers, and was performed on a single RTX4090 GPU for one epoch.
Key Capabilities & Performance
This fine-tuned model shows an overall improvement in performance across several benchmarks when compared to the original Llama-2-13b base model. Evaluation results, sourced from the HuggingFaceH4/open_llm_leaderboard, indicate:
- Average Score: Achieves an average score of 58.71, surpassing the base Llama-2-13b's 56.9.
- MMLU: Shows a notable increase in MMLU (Massive Multitask Language Understanding) score to 56.18, up from 54.34.
- TruthfulQA: Improves TruthfulQA performance to 39.65, compared to 34.17 for the base model.
- HellaSwag: Maintains strong performance on HellaSwag at 82.27.
Training Details
The model was trained with a per_device_train_batch_size of 8 and gradient_accumulation_steps of 8, using a learning rate of 5e-5. The training was conducted in bf16 precision with 4-bit quantization (load_in_4bit) and utilized DeepSpeed, resulting in a train_loss of 0.707 over a train_runtime of approximately 15 hours.
Intended Use Cases
This model is suitable for applications requiring enhanced general language understanding, improved reasoning, and better factual recall, particularly in scenarios where the huangyt/FINETUNE1 dataset's characteristics align with the target domain. Its improved benchmark scores suggest it can be a more capable alternative to the base Llama-2-13b for various text generation and comprehension tasks.