WikiLlama: Enhanced TinyLlama for General NLP

WikiLlama is a 1.1 billion parameter language model developed by Rudransh Joshi, created by applying LoRA (Low-Rank Adaptation) fine-tuning to the TinyLlama/TinyLlama-1.1B-Chat-v1.0 base model. The fine-tuning process utilized the WikiText-103 dataset, aiming to improve the model's general natural language processing performance.

Key Capabilities & Performance

Improved Accuracy: WikiLlama shows a notable improvement in accuracy on general NLP tasks. Evaluated on a sample of 100 examples from the HellaSwag dataset (sentence completion/multiple choice), it achieved 30% accuracy, representing a 6% absolute improvement over the original TinyLlama's 24%.
Efficient Fine-tuning: The model leverages LoRA, freezing the base weights, which allows for efficient adaptation and smaller fine-tuned checkpoints.
Base Model Compatibility: Retains the core architecture and characteristics of the TinyLlama-1.1B-Chat-v1.0 model, making it suitable for applications where a compact yet capable model is desired.

Use Cases

WikiLlama is particularly well-suited for scenarios requiring a small, efficient language model with enhanced general NLP understanding. Its improved performance on tasks like sentence completion and multiple-choice questions makes it a good candidate for:

Text Generation: Generating coherent and contextually relevant text.
Question Answering: Answering questions based on provided context.
Educational Applications: Tasks involving understanding and completing sentences or choosing correct options.
Resource-Constrained Environments: Deployments where computational resources are limited, benefiting from its 1.1B parameter size.

Overview

WikiLlama: Enhanced TinyLlama for General NLP

Key Capabilities & Performance

Use Cases

Full Model Card (README)