CHIH-HUNG/llama-2-13b-dolphin_5w

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Aug 25, 2023License:llama2Architecture:Transformer Open Weights Cold

CHIH-HUNG/llama-2-13b-dolphin_5w is a 13 billion parameter language model fine-tuned by CHIH-HUNG on the Meta Llama 2 architecture. It was trained using the first 50,000 entries of the ehartford/dolphin dataset, focusing on instruction-following tasks. This model demonstrates improved performance across benchmarks like ARC, HellaSwag, MMLU, and TruthfulQA compared to its base Llama-2-13b counterpart, making it suitable for general conversational and question-answering applications.

Loading preview...

Model Overview

CHIH-HUNG/llama-2-13b-dolphin_5w is a 13 billion parameter language model built upon the Meta Llama 2 architecture. It has been fine-tuned by CHIH-HUNG using a subset of the ehartford/dolphin dataset, specifically the first 50,000 entries, to enhance its instruction-following capabilities.

Fine-Tuning Details

The model was fine-tuned using LoRA (rank 8) targeting q_proj and v_proj layers, with a learning rate of 5e-5 over 1 epoch. The training utilized an RTX4090 GPU with bf16 precision and 4-bit quantization, achieving a train_loss of 0.8799 over a runtime of 7 hours and 11 minutes.

Performance Benchmarks

Evaluated against the HuggingFaceH4/open_llm_leaderboard, CHIH-HUNG/llama-2-13b-dolphin_5w shows competitive performance across several benchmarks when compared to the base Llama-2-13b and other dolphin-tuned models:

  • Average Score: 61.0 (highest among compared models)
  • ARC: 60.67
  • HellaSwag: 82.69
  • MMLU: 56.23
  • TruthfulQA: 44.41

This model notably surpasses the base meta-llama/Llama-2-13b-hf and meta-llama/Llama-2-13b-chat-hf in average score and individual benchmarks, indicating improved general reasoning and instruction adherence.

Recommended Use Cases

This model is well-suited for applications requiring robust instruction following and general conversational AI, particularly where a balance between performance and computational efficiency for a 13B parameter model is desired. Its strong benchmark results suggest proficiency in common language understanding and generation tasks.