w95/megachat

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:2kLicense:apache-2.0Architecture:Transformer Open Weights Warm

w95/megachat is a fine-tuned language model based on PY007/TinyLlama-1.1B-Chat-v0.3, developed by w95. This model demonstrates an average performance of 30.38 across various benchmarks including ARC, HellaSwag, and MMLU. It is suitable for general chat applications, leveraging its compact size for efficient deployment. The model was trained with a learning rate of 0.0002 over 2000 steps.

Loading preview...

Model Overview

w95/megachat is a fine-tuned language model derived from the PY007/TinyLlama-1.1B-Chat-v0.3 architecture. Developed by w95, this model is designed for chat-based applications, building upon the foundational capabilities of TinyLlama.

Performance Highlights

Evaluated on the Open LLM Leaderboard, w95/megachat achieves an overall average score of 30.38. Specific benchmark results include:

  • ARC (25-shot): 30.8
  • HellaSwag (10-shot): 54.35
  • MMLU (5-shot): 25.55
  • TruthfulQA (0-shot): 39.85
  • Winogrande (5-shot): 56.99
  • GSM8K (5-shot): 0.99
  • DROP (3-shot): 4.16

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 0.0002
  • Training Steps: 2000
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • LR Scheduler Type: Cosine

Intended Use Cases

Given its base model and fine-tuning, w95/megachat is primarily suited for:

  • General conversational AI
  • Lightweight chat applications
  • Exploratory natural language processing tasks where a smaller model is preferred for efficiency.