kevin009/lamatama

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:2kPublished:Jan 12, 2024License:apache-2.0Architecture:Transformer Open Weights Warm

kevin009/lamatama is a language model built on the Llama 2 architecture, pretrained on 3 trillion tokens over 90 days. This model is fine-tuned for chat-based applications, leveraging the TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T model and optimized with HF's Zephyr's training recipe. It excels in natural language understanding and generation for conversational AI, achieving an average score of 37.15 on the Open LLM Leaderboard.

Loading preview...

kevin009/lamatama: A Chat-Optimized Language Model

kevin009/lamatama is a language model developed by kevin009, designed for advanced natural language understanding and generation, particularly in conversational contexts. It is built upon the Llama 2 architecture and utilizes its tokenizer, ensuring broad compatibility.

Key Capabilities & Training:

  • Extensive Pretraining: The model was pretrained on an impressive 3 trillion tokens, a scale that enables a deep and nuanced understanding of language.
  • Efficient Training: The pretraining process spanned 90 days, utilizing 16 A100-40G GPUs, highlighting optimized training methodologies.
  • Chat Fine-tuning: This specific version is fine-tuned for chat-based applications, building on TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T.
  • Alignment: Initial fine-tuning used a variant of the UltraChat dataset (synthetic dialogues by ChatGPT), followed by further alignment with 🤗 TRL's DPOTrainer and the openbmb/UltraFeedback dataset (64k prompts ranked by GPT-4).

Performance Highlights:

On the Open LLM Leaderboard, kevin009/lamatama achieved an average score of 37.15.

  • AI2 Reasoning Challenge (25-Shot): 36.35
  • HellaSwag (10-Shot): 61.12
  • MMLU (5-Shot): 24.72
  • TruthfulQA (0-shot): 37.67
  • Winogrande (5-shot): 60.77

Ideal Use Cases:

  • Chatbots and Conversational AI: Its fine-tuning makes it highly suitable for engaging in natural, coherent dialogues.
  • Text Generation: Capable of generating human-like text for various applications.
  • Language Understanding: Benefits from its extensive pretraining for nuanced comprehension of language.