YeungNLP/firefly-llama-13b
YeungNLP/firefly-llama-13b is a 13 billion parameter Llama-based language model developed by YeungNLP, fine-tuned using the QLoRA method on the UltraChat dataset, comprising approximately 1.4 million multi-turn dialogues. This model is notable for its efficient training, requiring only a single GPU, and achieves competitive performance on the Open LLM Leaderboard, closely matching models like Vicuna-13B and Llama-2-13B-chat.
Loading preview...
Overview
YeungNLP/firefly-llama-13b is a 13 billion parameter language model built upon the Llama architecture. It was instruction-tuned using the QLoRA method on the extensive UltraChat dataset, which contains around 1.4 million multi-turn conversational data points. A key advantage of this model is its resource-efficient training, capable of being fine-tuned on a single GPU with as little as 16GB of VRAM, a significant reduction compared to full parameter fine-tuning approaches like those used for Vicuna-13B.
Performance
The model has been objectively evaluated on the 🤗 Hugging Face Open LLM Leaderboard, demonstrating strong performance relative to its peers. It scored 59.4 on the average benchmark, slightly outperforming vicuna-13b-1.1 and closely trailing Llama-2-13b-chat and vicuna-13b-v1.3. Its scores across various benchmarks, including ARC, HellaSwag, MMLU, and TruthfulQA (MC), indicate its general conversational and reasoning capabilities.
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA (MC) |
|---|---|---|---|---|---|
| Llama-2-13b-chat-hf | 59.9 | 59 | 81.9 | 54.6 | 44.1 |
| firefly-llama-13b | 59.4 | 59 | 79.7 | 49.1 | 49.6 |
| vicuna-13b-1.1 | 59.2 | 52.7 | 80.1 | 51.9 | 52.1 |
Key Differentiators
- Efficient Fine-tuning: Utilizes QLoRA, enabling fine-tuning of a 13B model with minimal hardware (e.g., 16GB VRAM).
- Competitive Performance: Achieves benchmark scores comparable to larger or more resource-intensive models like Vicuna-13B and Llama-2-13B-chat.
- Instruction-tuned: Optimized for multi-turn conversational tasks through training on the UltraChat dataset.
Use Cases
This model is suitable for applications requiring a capable 13B language model that can be deployed or further fine-tuned with limited computational resources. Its instruction-tuned nature makes it well-suited for conversational AI, chatbots, and general-purpose language generation tasks where performance close to leading 13B models is desired without the high training overhead.