ctrltokyo/llama-2-7b-hf-dolly-flash-attention
The ctrltokyo/llama-2-7b-hf-dolly-flash-attention model is a 7 billion parameter Llama-2-7b-hf variant fine-tuned by ctrltokyo on the Databricks Dolly-15k dataset. This model leverages Flash Attention 2 during training, making it suitable for generalized chatbot applications. It is designed for conversational tasks, distinguishing it from models optimized for code or other specialized functions.
Loading preview...
Model Overview
This model, ctrltokyo/llama-2-7b-hf-dolly-flash-attention, is a 7 billion parameter language model based on the NousResearch/Llama-2-7b-hf architecture. It has been fine-tuned by ctrltokyo using the databricks/databricks-dolly-15k dataset, with all training incorporating Flash Attention 2 for efficiency.
Key Characteristics
- Base Model: NousResearch/Llama-2-7b-hf (7B parameters).
- Fine-tuning Dataset: databricks/databricks-dolly-15k, which focuses on instruction-following capabilities for general-purpose chatbots.
- Training Optimization: Utilizes Flash Attention 2, potentially offering performance benefits during training and inference.
- Intended Use: Primarily designed for generalized chatbot applications and conversational AI.
Intended Use Cases
This model is best suited for:
- General-purpose chatbots: Engaging in diverse conversational interactions.
- Instruction following: Responding to a wide range of user prompts based on the Dolly-15k dataset's nature.
Limitations and Considerations
- No Code Support: Explicitly stated as not suitable for code-related tasks.
- No Further Optimization: The model has not undergone additional testing or optimization beyond the initial fine-tuning.
- VRAM Usage: Requires approximately 20GB of VRAM for raw model inference.