WikiLlama by Rudransh Joshi is a 1.1 billion parameter language model, LoRA fine-tuned from TinyLlama-1.1B-Chat-v1.0. It was trained on the WikiText-103 dataset to enhance general natural language processing capabilities. This model demonstrates improved accuracy on sentence completion and multiple-choice tasks, specifically showing a 6% absolute improvement on the HellaSwag benchmark over its base model.
Loading preview...
WikiLlama: Enhanced TinyLlama for General NLP
WikiLlama is a 1.1 billion parameter language model developed by Rudransh Joshi, created by applying LoRA (Low-Rank Adaptation) fine-tuning to the TinyLlama/TinyLlama-1.1B-Chat-v1.0 base model. The fine-tuning process utilized the WikiText-103 dataset, aiming to improve the model's general natural language processing performance.
Key Capabilities & Performance
- Improved Accuracy: WikiLlama shows a notable improvement in accuracy on general NLP tasks. Evaluated on a sample of 100 examples from the HellaSwag dataset (sentence completion/multiple choice), it achieved 30% accuracy, representing a 6% absolute improvement over the original TinyLlama's 24%.
- Efficient Fine-tuning: The model leverages LoRA, freezing the base weights, which allows for efficient adaptation and smaller fine-tuned checkpoints.
- Base Model Compatibility: Retains the core architecture and characteristics of the TinyLlama-1.1B-Chat-v1.0 model, making it suitable for applications where a compact yet capable model is desired.
Use Cases
WikiLlama is particularly well-suited for scenarios requiring a small, efficient language model with enhanced general NLP understanding. Its improved performance on tasks like sentence completion and multiple-choice questions makes it a good candidate for:
- Text Generation: Generating coherent and contextually relevant text.
- Question Answering: Answering questions based on provided context.
- Educational Applications: Tasks involving understanding and completing sentences or choosing correct options.
- Resource-Constrained Environments: Deployments where computational resources are limited, benefiting from its 1.1B parameter size.