YeungNLP/firefly-llama-13b

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jul 13, 2023Architecture:Transformer0.0K Cold

YeungNLP/firefly-llama-13b is a 13 billion parameter Llama-based language model developed by YeungNLP, fine-tuned using the QLoRA method on the UltraChat dataset, comprising approximately 1.4 million multi-turn dialogues. This model is notable for its efficient training, requiring only a single GPU, and achieves competitive performance on the Open LLM Leaderboard, closely matching models like Vicuna-13B and Llama-2-13B-chat.

Loading preview...

Overview

YeungNLP/firefly-llama-13b is a 13 billion parameter language model built upon the Llama architecture. It was instruction-tuned using the QLoRA method on the extensive UltraChat dataset, which contains around 1.4 million multi-turn conversational data points. A key advantage of this model is its resource-efficient training, capable of being fine-tuned on a single GPU with as little as 16GB of VRAM, a significant reduction compared to full parameter fine-tuning approaches like those used for Vicuna-13B.

Performance

The model has been objectively evaluated on the 🤗 Hugging Face Open LLM Leaderboard, demonstrating strong performance relative to its peers. It scored 59.4 on the average benchmark, slightly outperforming vicuna-13b-1.1 and closely trailing Llama-2-13b-chat and vicuna-13b-v1.3. Its scores across various benchmarks, including ARC, HellaSwag, MMLU, and TruthfulQA (MC), indicate its general conversational and reasoning capabilities.

Model Average ARC HellaSwag MMLU TruthfulQA (MC)
Llama-2-13b-chat-hf 59.9 59 81.9 54.6 44.1
firefly-llama-13b 59.4 59 79.7 49.1 49.6
vicuna-13b-1.1 59.2 52.7 80.1 51.9 52.1

Key Differentiators

  • Efficient Fine-tuning: Utilizes QLoRA, enabling fine-tuning of a 13B model with minimal hardware (e.g., 16GB VRAM).
  • Competitive Performance: Achieves benchmark scores comparable to larger or more resource-intensive models like Vicuna-13B and Llama-2-13B-chat.
  • Instruction-tuned: Optimized for multi-turn conversational tasks through training on the UltraChat dataset.

Use Cases

This model is suitable for applications requiring a capable 13B language model that can be deployed or further fine-tuned with limited computational resources. Its instruction-tuned nature makes it well-suited for conversational AI, chatbots, and general-purpose language generation tasks where performance close to leading 13B models is desired without the high training overhead.