guardrail/llama-2-7b-guanaco-dolly-8bit-sharded
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold
The guardrail/llama-2-7b-guanaco-dolly-8bit-sharded model is a 7 billion parameter Llama 2-based language model. It was fine-tuned using QLoRA in 4-bit precision on the OpenAssistant Guanaco and Databricks Dolly-15k datasets. This sharded version is specifically designed for deployment on resource-constrained environments like free Google Colab instances, making it accessible for experimentation and development.
Loading preview...
Model Overview
This model, guardrail/llama-2-7b-guanaco-dolly-8bit-sharded, is a 7 billion parameter variant of the Llama 2 architecture. It has undergone fine-tuning using the QLoRA method, which allows for efficient training in 4-bit precision.
Key Characteristics
- Base Model: Llama 2 (7B parameters).
- Fine-tuning: Utilizes QLoRA for efficient 4-bit precision fine-tuning.
- Training Data: Fine-tuned on a combination of two prominent instruction-following datasets:
- Sharding: The model is sharded, making it suitable for deployment and use in environments with limited computational resources, such as free Google Colab instances.
- Loading: Designed to be loaded in 8-bit using
load_in_8bit=Truewithtransformers.AutoModelForCausalLM.
Use Cases
This model is particularly well-suited for:
- Resource-constrained environments: Its sharded nature and 8-bit loading make it ideal for experimentation on free-tier cloud services.
- Instruction-following tasks: The fine-tuning on Guanaco and Dolly-15k datasets suggests proficiency in understanding and executing user instructions.
- Prototyping and development: Offers an accessible entry point for developers to work with a Llama 2-based model without requiring significant GPU resources.