Overview
This model is a 13 billion parameter variant of the Llama 2 architecture, developed by imone. Its primary distinction lies in the addition of two special tokens: <|end_of_turn|> (ID 32000) and <|PAD|> (ID 32001). The embedding vectors for these new tokens are initialized by taking the mean of all existing input/output token embeddings.
Key Capabilities
- Explicit Turn Demarcation: The inclusion of an End-of-Turn (EOT) token allows for clearer signaling of conversational turns, which can be beneficial for dialogue systems and multi-turn interactions.
- Padding Support: The
<|PAD|> token provides standard padding functionality, useful for batch processing and ensuring uniform input lengths. - Llama 2 Foundation: Retains the core capabilities and performance characteristics of the original Llama 2 13B model, making it suitable for a wide range of natural language processing tasks.
Good For
- Conversational AI: Ideal for fine-tuning on dialogue datasets where explicit turn boundaries are crucial for model understanding and generation.
- Structured Text Generation: Use cases where clear segmentation of generated text is required.
- Research and Experimentation: Provides a base for exploring the impact of explicit turn tokens on model behavior and performance in various NLP applications.