Enoch/llama-13b-hf

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Apr 13, 2023License:otherArchitecture:Transformer Cold

Enoch/llama-13b-hf is a 13 billion parameter auto-regressive language model, based on the transformer architecture, developed by Meta AI. This version is a HuggingFace conversion of the original LLaMA model, primarily intended for research in large language models. It excels in common sense reasoning, reading comprehension, and natural language understanding tasks, with a focus on understanding model capabilities and limitations.

Loading preview...

Model Overview

Enoch/llama-13b-hf is a 13 billion parameter LLaMA model, developed by Meta AI, converted for use with HuggingFace's Transformers library. This foundational model is an auto-regressive language model built on the transformer architecture, trained between December 2022 and February 2023.

Key Capabilities

  • Research Focus: Primarily designed for research into large language models, including exploring applications like question answering and natural language understanding.
  • Performance: Demonstrates strong performance across various reasoning tasks, including BoolQ (78.1%), PIQA (80.1%), and HellaSwag (79.2%).
  • Multilingual Data: While predominantly English, the training dataset included content from 20 languages, suggesting some multilingual understanding.
  • Bias Evaluation: Evaluated for biases across categories such as gender, religion, race, and sexual orientation, with an average bias score of 66.6.

Intended Use Cases

This model is best suited for:

  • Academic Research: Ideal for researchers studying LLM capabilities, limitations, and developing mitigation techniques for issues like bias and toxicity.
  • Understanding LLMs: Useful for exploring how large language models generate content, identify incorrect information, and exhibit biases.

Limitations

As a base model, LLaMA-13B has not been fine-tuned with human feedback and may generate toxic, offensive, or unhelpful content. It is not intended for direct deployment in downstream applications without further risk evaluation and mitigation.