Softechlb/Llama_2_13b_NEE

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kArchitecture:Transformer Cold

Llama 2 13B is a 13 billion parameter auto-regressive language model developed by Meta, part of the Llama 2 family of generative text models. This pretrained variant, converted for Hugging Face Transformers, utilizes an optimized transformer architecture and was trained on 2 trillion tokens of publicly available online data with a 4096-token context length. It is intended for commercial and research use in English, adaptable for various natural language generation tasks.

Loading preview...

Llama 2 13B: Pretrained Generative Text Model

This model is the 13 billion parameter pretrained variant from Meta's Llama 2 family of large language models, converted for the Hugging Face Transformers format. Llama 2 models are auto-regressive language models built with an optimized transformer architecture. The entire Llama 2 family was trained on 2 trillion tokens of a new mix of publicly available online data, with a pretraining data cutoff of September 2022.

Key Capabilities & Features

  • Architecture: Optimized transformer architecture for generative text tasks.
  • Scale: 13 billion parameters, offering a balance between performance and computational requirements.
  • Training Data: Pretrained on 2.0 trillion tokens from publicly available online sources.
  • Context Length: Supports a context length of 4096 tokens.
  • Intended Use: Designed for commercial and research applications in English, particularly adaptable for various natural language generation tasks.

Differentiators & Performance

Compared to its Llama 1 13B predecessor, Llama 2 13B shows improvements across several academic benchmarks, including Code (24.5 vs 18.9), Math (28.7 vs 10.9), and MMLU (54.8 vs 46.9). While fine-tuned Llama-2-Chat models are optimized for dialogue and show strong performance in human evaluations for helpfulness and safety, this specific model is a pretrained base model, offering flexibility for adaptation to diverse NLP tasks. Meta offsets 100% of the carbon emissions from its training process.