CHIH-HUNG/llama-2-13b-FINETUNE1_17w-q_k_v_o_proj

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Sep 3, 2023License:llama2Architecture:Transformer Open Weights Cold

The CHIH-HUNG/llama-2-13b-FINETUNE1_17w-q_k_v_o_proj is a 13 billion parameter Llama-2-based language model fine-tuned by CHIH-HUNG. It was trained on the huangyt/FINETUNE1 dataset, comprising approximately 170,000 data points, using LoRA with q_proj, k_proj, v_proj, and o_proj targets. This model demonstrates improved performance over the base Llama-2-13b model on benchmarks like ARC, HellaSwag, MMLU, and TruthfulQA, making it suitable for general language understanding and generation tasks.

Loading preview...

Overview

This model, CHIH-HUNG/llama-2-13b-FINETUNE1_17w-q_k_v_o_proj, is a fine-tuned variant of the 13 billion parameter Llama-2 base model. Developed by CHIH-HUNG, it leverages the huangyt/FINETUNE1 dataset, which consists of approximately 170,000 data entries, to enhance its capabilities.

Fine-Tuning Details

The fine-tuning process utilized an RTX4090 GPU and employed LoRA (Low-Rank Adaptation) with a rank of 8. The LoRA targets included the q_proj, k_proj, v_proj, and o_proj layers. Training was conducted for 1 epoch with a learning rate of 5e-5, using bf16 precision and 4-bit quantization for efficiency. The training loss achieved was 0.688 over a runtime of 15 hours and 44 minutes.

Performance Benchmarks

Evaluations against the HuggingFaceH4/open_llm_leaderboard benchmarks show that this fine-tuned model generally outperforms the base meta-llama/Llama-2-13b-hf model across several key metrics:

  • Average Score: 58.49 (compared to 56.9 for base Llama-2-13b)
  • ARC: 59.73
  • HellaSwag: 81.06
  • MMLU: 54.53
  • TruthfulQA: 38.64

These results indicate an improvement in reasoning, common sense, and factual accuracy compared to the original Llama-2-13b model.

Recommended Use Cases

This model is well-suited for applications requiring enhanced general language understanding and generation, particularly where the base Llama-2-13b model's performance could be improved. Its fine-tuning on a diverse dataset suggests applicability in various conversational AI, text summarization, and question-answering scenarios.