Model Overview

w95/megachat is a fine-tuned language model derived from the PY007/TinyLlama-1.1B-Chat-v0.3 architecture. Developed by w95, this model is designed for chat-based applications, building upon the foundational capabilities of TinyLlama.

Performance Highlights

Evaluated on the Open LLM Leaderboard, w95/megachat achieves an overall average score of 30.38. Specific benchmark results include:

ARC (25-shot): 30.8
HellaSwag (10-shot): 54.35
MMLU (5-shot): 25.55
TruthfulQA (0-shot): 39.85
Winogrande (5-shot): 56.99
GSM8K (5-shot): 0.99
DROP (3-shot): 4.16

Training Details

The model was trained using the following key hyperparameters:

Learning Rate: 0.0002
Training Steps: 2000
Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
LR Scheduler Type: Cosine

Intended Use Cases

Given its base model and fine-tuning, w95/megachat is primarily suited for:

General conversational AI
Lightweight chat applications
Exploratory natural language processing tasks where a smaller model is preferred for efficiency.

Overview

Model Overview

Performance Highlights

Training Details

Intended Use Cases

Full Model Card (README)