learnanything/llama-7b-huggingface

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 16, 2023License:otherArchitecture:Transformer0.0K Cold

The learnanything/llama-7b-huggingface model is a 7 billion parameter auto-regressive language model developed by Meta AI's FAIR team, based on the transformer architecture. This version is adapted for Hugging Face's `LlamaModel` and `LlamaTokenizer`, supporting `AutoModel` and `AutoTokenizer` for easier integration. It is primarily intended for research in large language models, focusing on understanding capabilities, limitations, and developing mitigation techniques for issues like bias and harmful content generation.

Loading preview...

LLaMA-7B Hugging Face Adaptation

This model is the 7 billion parameter version of LLaMA, an auto-regressive language model developed by Meta AI's FAIR team. It is based on the transformer architecture and was trained between December 2022 and February 2023. This specific repository provides an adaptation for seamless integration with Hugging Face's transformers library, supporting AutoModel and AutoTokenizer for straightforward loading, including an option for 8-bit quantization to reduce memory footprint.

Key Characteristics

  • Architecture: Transformer-based, auto-regressive language model.
  • Parameters: 7 billion parameters.
  • Training Data: Trained on a diverse dataset including CCNet (67%), C4 (15%), GitHub (4.5%), Wikipedia (4.5%), Books (4.5%), ArXiv (2.5%), and Stack Exchange (2%).
  • Multilingual Support: While primarily English-centric, the training data included 20 languages, suggesting some multilingual capability.
  • Research Focus: Designed as a foundational model for research into large language models, including exploring applications, understanding limitations, and developing bias/harm mitigation strategies.

Intended Use Cases

  • Research: Ideal for researchers studying large language models, their capabilities, and limitations.
  • Application Exploration: Suitable for exploring potential applications such as question answering, natural language understanding, and reading comprehension.
  • Bias and Harm Mitigation: Useful for evaluating and developing techniques to mitigate biases, risks, toxic content generation, and hallucinations.

Performance Highlights

On common sense reasoning tasks, the 7B model achieved scores such as 76.5 on BoolQ, 79.8 on PIQA, and 76.1 on HellaSwag. It is important to note that LLaMA is a base model and has not been fine-tuned with human feedback, meaning it can generate unhelpful, incorrect, or offensive content. Users should conduct further risk evaluation and mitigation before deploying in downstream applications.