Name: CHIH-HUNG/llama-2-13b-FINETUNE1_17w-gate_up_down_proj API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: CHIH-HUNG

Overview

This model, CHIH-HUNG/llama-2-13b-FINETUNE1_17w-gate_up_down_proj, is a 13 billion parameter language model built upon the meta-llama/Llama-2-13b-hf architecture. It has been fine-tuned by CHIH-HUNG using the huangyt/FINETUNE1 dataset, which contains approximately 170,000 data entries.

Fine-Tuning Details

The fine-tuning process utilized LoRA (Low-Rank Adaptation) with a rank of 8, specifically targeting the gate_proj, up_proj, and down_proj attention projection layers. Training was conducted for 1 epoch on a single RTX4090 GPU, employing a batch size of 8, a learning rate of 5e-5, and bf16 precision with 4-bit quantization. The training achieved a loss of 0.66 over 16 hours and 24 minutes.

Performance Evaluation

Evaluations against the HuggingFaceH4/open_llm_leaderboard benchmarks show improvements compared to the base Llama-2-13b model:

Average Score: 58.81 (vs. 56.9 for base Llama-2-13b)
HellaSwag: 82.26 (vs. 80.97 for base Llama-2-13b)
MMLU: 55.89 (vs. 54.34 for base Llama-2-13b)
TruthfulQA: 39.93 (vs. 34.17 for base Llama-2-13b)

While ARC score saw a slight decrease, the model generally exhibits enhanced capabilities in common sense reasoning, multi-task language understanding, and factual accuracy compared to its base counterpart.

Usage

This model is suitable for tasks requiring general language understanding and generation, benefiting from the fine-tuning on a substantial dataset and targeted LoRA application.

Overview

Overview

Fine-Tuning Details

Performance Evaluation

Usage

Full Model Card (README)