abhinand/TinyLlama-1.1B-OpenHermes-2.5-Chat-v0.1-sft
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:2kPublished:Feb 6, 2024License:apache-2.0Architecture:Transformer Open Weights Warm

abhinand/TinyLlama-1.1B-OpenHermes-2.5-Chat-v0.1-sft is a 1.1 billion parameter causal language model developed by abhinand. It is a fine-tuned version of the TinyLlama base model, trained on the OpenHermes 2.5 and UltraChat 200k datasets for a single epoch. This model is designed for chat-based applications, offering a compact solution for conversational AI with a context length of 2048 tokens.

Loading preview...

Model Overview

abhinand/TinyLlama-1.1B-OpenHermes-2.5-Chat-v0.1-sft is a 1.1 billion parameter language model, built upon the TinyLlama base architecture. This model has been fine-tuned using a combination of the OpenHermes 2.5 and UltraChat 200k datasets, undergoing a single epoch of training. The fine-tuning process utilized an axolotl configuration, incorporating LoRA (Low-Rank Adaptation) with specific target modules for efficient adaptation.

Key Capabilities & Training Details

  • Architecture: Based on the TinyLlama 1.1B intermediate step model.
  • Fine-tuning Datasets: OpenHermes 2.5 and UltraChat 200k, both formatted for chat conversations.
  • Training Method: LoRA adapter with r=32 and alpha=16, applied to key attention and feed-forward layers.
  • Context Length: Supports a sequence length of 2048 tokens.
  • Optimization: Trained with bf16 precision, adamw_bnb_8bit optimizer, and a cosine learning rate scheduler.
  • Chat Template: Uses the chatml format for conversations.

Performance Metrics

Evaluations on the Open LLM Leaderboard indicate the model's performance across various benchmarks:

  • Average Score: 36.59
  • AI2 Reasoning Challenge (25-Shot): 33.79
  • HellaSwag (10-Shot): 58.72
  • MMLU (5-Shot): 24.52
  • TruthfulQA (0-shot): 36.22
  • Winogrande (5-shot): 60.93
  • GSM8k (5-shot): 5.38

These results provide insight into its reasoning, common sense, and general knowledge capabilities, particularly for a model of its size.