w95/megachat
w95/megachat is a fine-tuned language model based on PY007/TinyLlama-1.1B-Chat-v0.3, developed by w95. This model demonstrates an average performance of 30.38 across various benchmarks including ARC, HellaSwag, and MMLU. It is suitable for general chat applications, leveraging its compact size for efficient deployment. The model was trained with a learning rate of 0.0002 over 2000 steps.
Loading preview...
Model Overview
w95/megachat is a fine-tuned language model derived from the PY007/TinyLlama-1.1B-Chat-v0.3 architecture. Developed by w95, this model is designed for chat-based applications, building upon the foundational capabilities of TinyLlama.
Performance Highlights
Evaluated on the Open LLM Leaderboard, w95/megachat achieves an overall average score of 30.38. Specific benchmark results include:
- ARC (25-shot): 30.8
- HellaSwag (10-shot): 54.35
- MMLU (5-shot): 25.55
- TruthfulQA (0-shot): 39.85
- Winogrande (5-shot): 56.99
- GSM8K (5-shot): 0.99
- DROP (3-shot): 4.16
Training Details
The model was trained using the following key hyperparameters:
- Learning Rate: 0.0002
- Training Steps: 2000
- Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- LR Scheduler Type: Cosine
Intended Use Cases
Given its base model and fine-tuning, w95/megachat is primarily suited for:
- General conversational AI
- Lightweight chat applications
- Exploratory natural language processing tasks where a smaller model is preferred for efficiency.