VMware/open-llama-0.3T-7B-instruct-dolly-hhrlhf

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

VMware/open-llama-0.3T-7B-instruct-dolly-hhrlhf is a 7 billion parameter instruction-tuned causal language model developed by VMware, built upon a partially trained Open-LLaMA checkpoint (300B tokens). This model is fine-tuned using the mosaicml/dolly_hhrlhf instruction dataset, making it suitable for general instruction-following tasks. It is fully open-source and commercially viable, offering a 4096-token context window for diverse applications.

Loading preview...

Model Overview

VMware/open-llama-0.3T-7B-instruct-dolly-hhrlhf is a 7 billion parameter instruction-tuned language model. It is based on a partially trained Open-LLaMA checkpoint (300 billion tokens) and further fine-tuned using the mosaicml/dolly_hhrlhf instruction dataset. This model is designed for general instruction-following and is notable for being fully open-source and commercially viable, with its components released under Apache-2.0 and CC-BY-SA-3.0 licenses respectively.

Key Features

  • Instruction-Tuned: Optimized for understanding and responding to user instructions, leveraging the dolly_hhrlhf dataset.
  • Open-Source & Commercial Use: Both the underlying language model and the instruction dataset are available under permissive licenses, allowing for broad commercial and research applications.
  • Context Length: Supports a context window of 4096 tokens, enabling processing of moderately long inputs.

Usage Considerations

When using this model with the Hugging Face Transformers library, it is crucial to load the tokenizer with add_bos_token = True as the model was trained with a Beginning-Of-Sentence (BOS) token. An example of generating a response to a prompt like "how do I bake a cake?" is provided in the model's documentation.

Limitations

One known drawback is that the model was trained on a partially trained Open-LLaMA checkpoint, specifically one that had processed 300 billion tokens, which might impact its overall performance compared to models trained on more extensive datasets.