jploski/llama-7b-hf
jploski/llama-7b-hf is a 7 billion parameter auto-regressive language model based on the transformer architecture, developed by Meta AI's FAIR team. This version is a re-sharded LLaMA-7B checkpoint, enabling loading in memory-restricted environments like free Google Colab. It is primarily intended for research in large language models, focusing on understanding capabilities, limitations, and developing improvements in areas like question answering and natural language understanding.
Loading preview...
LLaMA-7B: A Foundation Model for Research
jploski/llama-7b-hf is a 7 billion parameter LLaMA model, originally developed by Meta AI's FAIR team. This specific repository provides the LLaMA-7B weights, re-sharded into smaller files to facilitate loading in environments with limited memory, such as free Google Colab instances. The model is an auto-regressive language model built on the transformer architecture.
Key Characteristics & Training:
- Architecture: Transformer-based, auto-regressive language model.
- Parameters: 7 billion.
- Training Data: Trained on a diverse dataset including CCNet (67%), C4 (15%), GitHub (4.5%), Wikipedia (4.5%), Books (4.5%), ArXiv (2.5%), and Stack Exchange (2%). The dataset includes 20 languages, though it is predominantly English.
- Context Length: 4096 tokens.
- Performance: Achieves 76.5% on BoolQ, 79.8% on PIQA, and 76.1% on HellaSwag for common sense reasoning tasks.
Intended Use & Limitations:
LLaMA-7B is primarily intended for research purposes in large language models. This includes exploring applications like question answering and natural language understanding, evaluating model capabilities and limitations, and studying biases or harmful content generation. It is a foundational model and not intended for direct deployment in downstream applications without further risk evaluation and mitigation. As it was not trained with human feedback, it may generate toxic, offensive, or incorrect information. The model operates under a non-commercial bespoke license.