guardrail/llama-2-7b-guanaco-dolly-8bit-sharded
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The guardrail/llama-2-7b-guanaco-dolly-8bit-sharded model is a 7 billion parameter Llama 2-based language model. It was fine-tuned using QLoRA in 4-bit precision on the OpenAssistant Guanaco and Databricks Dolly-15k datasets. This sharded version is specifically designed for deployment on resource-constrained environments like free Google Colab instances, making it accessible for experimentation and development.

Loading preview...

Model Overview

This model, guardrail/llama-2-7b-guanaco-dolly-8bit-sharded, is a 7 billion parameter variant of the Llama 2 architecture. It has undergone fine-tuning using the QLoRA method, which allows for efficient training in 4-bit precision.

Key Characteristics

  • Base Model: Llama 2 (7B parameters).
  • Fine-tuning: Utilizes QLoRA for efficient 4-bit precision fine-tuning.
  • Training Data: Fine-tuned on a combination of two prominent instruction-following datasets:
  • Sharding: The model is sharded, making it suitable for deployment and use in environments with limited computational resources, such as free Google Colab instances.
  • Loading: Designed to be loaded in 8-bit using load_in_8bit=True with transformers.AutoModelForCausalLM.

Use Cases

This model is particularly well-suited for:

  • Resource-constrained environments: Its sharded nature and 8-bit loading make it ideal for experimentation on free-tier cloud services.
  • Instruction-following tasks: The fine-tuning on Guanaco and Dolly-15k datasets suggests proficiency in understanding and executing user instructions.
  • Prototyping and development: Offers an accessible entry point for developers to work with a Llama 2-based model without requiring significant GPU resources.