Llama 2 13B: Pretrained Generative Text Model

This model is the 13 billion parameter pretrained variant from Meta's Llama 2 family of large language models, converted for the Hugging Face Transformers format. Llama 2 models are auto-regressive language models built with an optimized transformer architecture. The entire Llama 2 family was trained on 2 trillion tokens of a new mix of publicly available online data, with a pretraining data cutoff of September 2022.

Key Capabilities & Features

Architecture: Optimized transformer architecture for generative text tasks.
Scale: 13 billion parameters, offering a balance between performance and computational requirements.
Training Data: Pretrained on 2.0 trillion tokens from publicly available online sources.
Context Length: Supports a context length of 4096 tokens.
Intended Use: Designed for commercial and research applications in English, particularly adaptable for various natural language generation tasks.

Differentiators & Performance

Compared to its Llama 1 13B predecessor, Llama 2 13B shows improvements across several academic benchmarks, including Code (24.5 vs 18.9), Math (28.7 vs 10.9), and MMLU (54.8 vs 46.9). While fine-tuned Llama-2-Chat models are optimized for dialogue and show strong performance in human evaluations for helpfulness and safety, this specific model is a pretrained base model, offering flexibility for adaptation to diverse NLP tasks. Meta offsets 100% of the carbon emissions from its training process.

Overview

Llama 2 13B: Pretrained Generative Text Model

Key Capabilities & Features

Differentiators & Performance

Full Model Card (README)