deinon-daemon/axolotl-13b-chat-qlora-dev
The deinon-daemon/axolotl-13b-chat-qlora-dev is a 13 billion parameter instruct-tuned chat model, fine-tuned from Llama-2-13b-chat-hf. Developed by deinon-daemon, it utilizes QLORA and Flash Attention for efficient training on a 40k slice of the Open-Orca dataset. This model is a proof-of-concept demonstrating a small-is-powerful approach to chat model development, aiming for performance comparable to other Llama/Alpaca/Guanaco/Vicuna models of similar scale.
Loading preview...
Overview
deinon-daemon/axolotl-13b-chat-qlora-dev is a 13 billion parameter instruct-tuned chat model, built upon the Llama-2-13b-chat-hf architecture. This model represents a rapid development effort by deinon-daemon, fine-tuned over approximately 9 hours using a single Nvidia A100 GPU.
Key Capabilities & Training
- Efficient Fine-tuning: Leverages advanced quantization techniques including Bitsandbytes, QLORA, and Flash Attention with einops and ninja Ampere optimizations.
- Dataset: Fine-tuned for 3 epochs on a 40k slice of the Open-Orca dataset, augmented with self-collected contextual QA chat data.
- Prompt Templating: All training examples were processed and templated into a standard chat instruct prompt format.
Performance & Purpose
- Comparative Performance: Initial assessments suggest performance at least on par with, if not slightly better than, other fine-tuned Llama/Alpaca/Guanaco/Vicuna models of this scale.
- Proof of Concept: This model is explicitly tagged as a 'dev' version, serving as a proof of concept for efficient fine-tuning methodologies. Further evaluation and benchmarking, particularly against models like
stabilityai/StableBeluga13B, are planned for future production releases.