Overview
This model, mesolitica/malaysian-llama2-7b-32k-instructions, is a 7 billion parameter Llama2-based instruction-tuned model developed by Mesolitica. It leverages QLORA for fine-tuning and is specifically trained on a Malaysian-translated version of the UltraChat dataset. A key feature is its extended context window of 32,768 tokens, which allows for processing longer conversations and more complex prompts.
Key Capabilities
- Malaysian Language Proficiency: Optimized for generating responses and understanding queries in the Malaysian language.
- Chat Completions: Designed to follow the Llama2 chat template for effective conversational interactions.
- Extended Context: Supports a 32k token context length, enabling more coherent and context-aware responses over longer dialogues.
- Quantization: Utilizes 4-bit quantization (NF4) with double quantization and bfloat16 compute dtype for efficient deployment.
- Flash Attention 2: Incorporates Flash Attention 2 for potentially faster inference.
Good For
- Developing chatbots and conversational agents for Malaysian-speaking users.
- Applications requiring long-context understanding and generation in Malaysian.
- Research and development in low-resource language NLP, specifically for Malaysian.
- Use cases where efficient deployment of a 7B parameter model with extended context is crucial.