kevin009/lamatama: A Chat-Optimized Language Model
kevin009/lamatama is a language model developed by kevin009, designed for advanced natural language understanding and generation, particularly in conversational contexts. It is built upon the Llama 2 architecture and utilizes its tokenizer, ensuring broad compatibility.
Key Capabilities & Training:
- Extensive Pretraining: The model was pretrained on an impressive 3 trillion tokens, a scale that enables a deep and nuanced understanding of language.
- Efficient Training: The pretraining process spanned 90 days, utilizing 16 A100-40G GPUs, highlighting optimized training methodologies.
- Chat Fine-tuning: This specific version is fine-tuned for chat-based applications, building on
TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T. - Alignment: Initial fine-tuning used a variant of the UltraChat dataset (synthetic dialogues by ChatGPT), followed by further alignment with 🤗 TRL's DPOTrainer and the openbmb/UltraFeedback dataset (64k prompts ranked by GPT-4).
Performance Highlights:
On the Open LLM Leaderboard, kevin009/lamatama achieved an average score of 37.15.
- AI2 Reasoning Challenge (25-Shot): 36.35
- HellaSwag (10-Shot): 61.12
- MMLU (5-Shot): 24.72
- TruthfulQA (0-shot): 37.67
- Winogrande (5-shot): 60.77
Ideal Use Cases:
- Chatbots and Conversational AI: Its fine-tuning makes it highly suitable for engaging in natural, coherent dialogues.
- Text Generation: Capable of generating human-like text for various applications.
- Language Understanding: Benefits from its extensive pretraining for nuanced comprehension of language.