deinon-daemon/axolotl-13b-chat-qlora-dev

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:llama2Architecture:Transformer Open Weights Cold

The deinon-daemon/axolotl-13b-chat-qlora-dev is a 13 billion parameter instruct-tuned chat model, fine-tuned from Llama-2-13b-chat-hf. Developed by deinon-daemon, it utilizes QLORA and Flash Attention for efficient training on a 40k slice of the Open-Orca dataset. This model is a proof-of-concept demonstrating a small-is-powerful approach to chat model development, aiming for performance comparable to other Llama/Alpaca/Guanaco/Vicuna models of similar scale.

Loading preview...

Overview

deinon-daemon/axolotl-13b-chat-qlora-dev is a 13 billion parameter instruct-tuned chat model, built upon the Llama-2-13b-chat-hf architecture. This model represents a rapid development effort by deinon-daemon, fine-tuned over approximately 9 hours using a single Nvidia A100 GPU.

Key Capabilities & Training

  • Efficient Fine-tuning: Leverages advanced quantization techniques including Bitsandbytes, QLORA, and Flash Attention with einops and ninja Ampere optimizations.
  • Dataset: Fine-tuned for 3 epochs on a 40k slice of the Open-Orca dataset, augmented with self-collected contextual QA chat data.
  • Prompt Templating: All training examples were processed and templated into a standard chat instruct prompt format.

Performance & Purpose

  • Comparative Performance: Initial assessments suggest performance at least on par with, if not slightly better than, other fine-tuned Llama/Alpaca/Guanaco/Vicuna models of this scale.
  • Proof of Concept: This model is explicitly tagged as a 'dev' version, serving as a proof of concept for efficient fine-tuning methodologies. Further evaluation and benchmarking, particularly against models like stabilityai/StableBeluga13B, are planned for future production releases.