CHIH-HUNG/llama-2-13b-FINETUNE3_3.3w

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:llama2Architecture:Transformer Open Weights Cold

CHIH-HUNG/llama-2-13b-FINETUNE3_3.3w is a 13 billion parameter language model fine-tuned from Meta's Llama-2-13b using the huangyt/FINETUNE3 dataset, comprising approximately 33,000 data points. This model was trained with LoRA on a single RTX4090 GPU, focusing on improving performance across general language understanding benchmarks. It demonstrates competitive performance compared to its base model, Llama-2-13b, particularly in areas like HellaSwag and TruthfulQA.

Loading preview...

Model Overview

CHIH-HUNG/llama-2-13b-FINETUNE3_3.3w is a fine-tuned version of the 13 billion parameter Llama-2-13b model. It was trained by CHIH-HUNG using the huangyt/FINETUNE3 dataset, which consists of approximately 33,000 data entries. The fine-tuning process utilized LoRA (Low-Rank Adaptation) with a rank of 16, targeting gate_proj, up_proj, and down_proj layers.

Fine-Tuning Details

  • Base Model: meta-llama/Llama-2-13b-hf
  • Dataset: huangyt/FINETUNE3 (approx. 33,000 samples)
  • Hardware: Single RTX4090 GPU
  • PEFT Type: LoRA
  • Training Parameters:
    • lora_rank: 16
    • per_device_train_batch_size: 8
    • gradient_accumulation_steps: 8
    • learning_rate: 4e-4
    • epoch: 1
    • precision: bf16
    • quantization: load_in_4bit
  • Training Loss: 0.579 over 4 hours and 6 minutes using DeepSpeed.

Evaluation & Performance

Evaluations were conducted against the Llama-2-13b and Llama-2-13b-chat-hf models across four benchmarks: ARC, HellaSwag, MMLU, and TruthfulQA. The fine-tuned model achieved an average score of 58.9, showing improvements over the base Llama-2-13b in HellaSwag (82.38 vs 80.97) and TruthfulQA (39.73 vs 34.17), while maintaining similar performance in ARC and MMLU.

Model Average ARC HellaSwag MMLU TruthfulQA
meta-llama/Llama-2-13b-hf 56.9 58.11 80.97 54.34 34.17
meta-llama/Llama-2-13b-chat-hf 59.93 59.04 81.94 54.64 44.12
CHIH-HUNG/llama-2-13b-FINETUNE3_3.3w 58.9 58.95 82.38 54.56 39.73

Intended Use

This model is suitable for tasks requiring general language understanding and generation, particularly where improvements in common-sense reasoning (HellaSwag) and factual accuracy (TruthfulQA) are beneficial compared to the base Llama-2-13b model.