Softechlb/Llama_2_13b_NEE
Llama 2 13B is a 13 billion parameter auto-regressive language model developed by Meta, part of the Llama 2 family of generative text models. This pretrained variant, converted for Hugging Face Transformers, utilizes an optimized transformer architecture and was trained on 2 trillion tokens of publicly available online data with a 4096-token context length. It is intended for commercial and research use in English, adaptable for various natural language generation tasks.
Loading preview...
Llama 2 13B: Pretrained Generative Text Model
This model is the 13 billion parameter pretrained variant from Meta's Llama 2 family of large language models, converted for the Hugging Face Transformers format. Llama 2 models are auto-regressive language models built with an optimized transformer architecture. The entire Llama 2 family was trained on 2 trillion tokens of a new mix of publicly available online data, with a pretraining data cutoff of September 2022.
Key Capabilities & Features
- Architecture: Optimized transformer architecture for generative text tasks.
- Scale: 13 billion parameters, offering a balance between performance and computational requirements.
- Training Data: Pretrained on 2.0 trillion tokens from publicly available online sources.
- Context Length: Supports a context length of 4096 tokens.
- Intended Use: Designed for commercial and research applications in English, particularly adaptable for various natural language generation tasks.
Differentiators & Performance
Compared to its Llama 1 13B predecessor, Llama 2 13B shows improvements across several academic benchmarks, including Code (24.5 vs 18.9), Math (28.7 vs 10.9), and MMLU (54.8 vs 46.9). While fine-tuned Llama-2-Chat models are optimized for dialogue and show strong performance in human evaluations for helpfulness and safety, this specific model is a pretrained base model, offering flexibility for adaptation to diverse NLP tasks. Meta offsets 100% of the carbon emissions from its training process.