yahma/llama-13b-hf

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Apr 8, 2023License:otherArchitecture:Transformer0.0K Cold

The yahma/llama-13b-hf is a 13 billion parameter auto-regressive language model, based on the Transformer architecture, developed by the FAIR team of Meta AI. This version is a conversion of the original LLaMA-13B model, updated for compatibility with HuggingFace Transformers as of April 2023, and addresses EOS token issues. Primarily intended for research on large language models, it excels in exploring applications like question answering and natural language understanding.

Loading preview...

Model Overview

The yahma/llama-13b-hf is a 13 billion parameter LLaMA model, developed by Meta AI's FAIR team. This specific version is a conversion of the original LLaMA-13B, updated in April 2023 to ensure compatibility with HuggingFace Transformers and resolve End-Of-Sentence (EOS) token issues. LLaMA models are auto-regressive language models built on the Transformer architecture.

Key Characteristics

  • Architecture: Transformer-based, auto-regressive language model.
  • Parameter Count: 13 billion parameters.
  • Context Length: 4096 tokens.
  • Training Data: Trained on a diverse dataset including CCNet (67%), C4 (15%), GitHub (4.5%), Wikipedia (4.5%), Books (4.5%), ArXiv (2.5%), and Stack Exchange (2%). The training data includes 20 languages, though English constitutes the majority.
  • Performance: Demonstrates strong performance on common sense reasoning benchmarks, with scores like 78.1% on BoolQ and 79.2% on HellaSwag for the 13B variant.
  • License: Operates under a non-commercial bespoke license, primarily for research use.

Intended Use Cases

This model is primarily intended for research purposes in large language models, including:

  • Exploring potential applications such as question answering, natural language understanding, and reading comprehension.
  • Understanding the capabilities and limitations of current language models.
  • Developing techniques to improve language models.
  • Evaluating and mitigating biases, risks, toxic content generation, and hallucinations.

As a foundational model, it is not intended for direct deployment in downstream applications without further risk evaluation and mitigation, as it has not been trained with human feedback and may generate unhelpful or harmful content.