anonymous4chan/llama-2-13b: An Overview
This model is a 13 billion parameter variant of Meta's Llama 2 family of large language models, specifically the pretrained version, adapted for the Hugging Face Transformers format. Llama 2 models are auto-regressive language models built on an optimized transformer architecture. The family includes models ranging from 7B to 70B parameters, with both pretrained and fine-tuned (Llama-2-Chat) variations.
Key Capabilities & Characteristics
- Architecture: Optimized transformer architecture, auto-regressive language model.
- Training Data: Pretrained on 2 trillion tokens from a new mix of publicly available online data, with a data cutoff of September 2022.
- Context Length: Supports a 4k token context length.
- Intended Use: Designed for commercial and research applications in English. This pretrained model can be adapted for various natural language generation tasks.
- Performance: The 13B Llama 2 model shows improved performance over Llama 1 13B across academic benchmarks, including Code (24.5 vs 18.9), Commonsense Reasoning (66.9 vs 66.1), World Knowledge (55.4 vs 52.6), and MMLU (54.8 vs 46.9).
Important Considerations
- License: Use is governed by a custom commercial license from Meta, requiring acceptance on their website before access.
- Limitations: As with all LLMs, it carries risks of producing inaccurate, biased, or objectionable responses. Developers should perform safety testing and tuning for specific applications. It is not intended for use in languages other than English.