Overview
This model, CHIH-HUNG/llama-2-13b-FINETUNE5_4w-r4-q_k_v_o, is a 13 billion parameter variant of the Llama-2 architecture. It has been fine-tuned by CHIH-HUNG using the LoRA (Low-Rank Adaptation) method on the huangyt/FINETUNE5 dataset, which contains approximately 40,000 training examples. The fine-tuning process utilized an RTX4090 GPU, with specific parameters including a LoRA rank of 16 targeting q_proj, k_proj, v_proj, and o_proj layers, a learning rate of 4e-4, and training for one epoch in bf16 precision with 4-bit quantization.
Performance Benchmarks
The model's performance was evaluated against the base Llama-2-13b across four standard benchmarks, with scores measured locally using 8-bit quantization:
- Average Score: 56.09
- ARC: 54.35
- HellaSwag: 79.24
- MMLU: 54.01
- TruthfulQA: 36.75
These results indicate its capabilities in common reasoning, commonsense, and factual recall tasks. The fine-tuning process achieved a train_loss of 0.579 over a runtime of approximately 4 hours and 6 minutes using DeepSpeed.
Key Differentiators
- Targeted Fine-tuning: Specialized LoRA fine-tuning on a custom dataset (
huangyt/FINETUNE5) for potentially improved performance on tasks aligned with the dataset's content. - Efficient Training: Utilizes LoRA with 4-bit quantization and bf16 precision, making it efficient for deployment and further fine-tuning on consumer-grade hardware.
Recommended Use Cases
This model is suitable for applications requiring a Llama-2-13b base with enhanced performance on tasks similar to those found in the huangyt/FINETUNE5 dataset. It can be used for general text generation, question answering, and understanding tasks where a balance between model size and performance is desired.