CHIH-HUNG/llama-2-13b-FINETUNE2_3w-q_k_v_o_proj

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Sep 2, 2023License:llama2Architecture:Transformer Open Weights Cold

CHIH-HUNG/llama-2-13b-FINETUNE2_3w-q_k_v_o_proj is a 13 billion parameter Llama-2-based language model fine-tuned by CHIH-HUNG using the huangyt/FINETUNE2 dataset, comprising approximately 30,000 data entries. This model was fine-tuned with LoRA targeting the q_proj, k_proj, v_proj, and o_proj layers. It demonstrates competitive performance across benchmarks like ARC, HellaSwag, MMLU, and TruthfulQA compared to its base Llama-2-13b counterpart, with a context length of 4096 tokens.

Loading preview...

Model Overview

CHIH-HUNG/llama-2-13b-FINETUNE2_3w-q_k_v_o_proj is a 13 billion parameter language model built upon the meta-llama/Llama-2-13b-hf architecture. It was fine-tuned by CHIH-HUNG using the huangyt/FINETUNE2 dataset, which contains approximately 30,000 training examples.

Fine-Tuning Details

The fine-tuning process utilized LoRA (Low-Rank Adaptation) with a rank of 8, specifically targeting the q_proj, k_proj, v_proj, and o_proj attention projection layers. Training was conducted for 1 epoch with a learning rate of 5e-5, using bf16 precision and load_in_4bit quantization. The training loss achieved was 0.65 over a runtime of approximately 3 hours and 33 minutes.

Performance Benchmarks

Evaluation against the HuggingFaceH4/open_llm_leaderboard benchmarks shows the model's performance relative to the base Llama-2-13b models. While its average score is slightly higher than the base Llama-2-13b, it shows improvements in HellaSwag and TruthfulQA scores. For instance, it achieved 82.47 on HellaSwag and 37.92 on TruthfulQA, compared to 80.97 and 34.17 respectively for the base Llama-2-13b-hf.

Use Cases

This model is suitable for tasks requiring a Llama-2-13b variant that has undergone specific fine-tuning on a custom dataset, potentially offering specialized knowledge or improved performance in areas covered by the huangyt/FINETUNE2 dataset. Developers can leverage its fine-tuned capabilities for applications where a 13B parameter model with a 4096-token context window is appropriate.