vikash06/llama-2-7b-small-model-new
The vikash06/llama-2-7b-small-model-new is a 7 billion parameter Llama 2-based language model, fine-tuned on a small, curated dataset to evaluate performance under limited data conditions. This model is designed for diverse natural language processing tasks including creative writing, various question answering formats (closed, open), summarization, information extraction, classification, and brainstorming. It focuses on demonstrating the effectiveness of intensive training on smaller, high-quality datasets for specific applications.
Loading preview...
Model Overview
This model, vikash06/llama-2-7b-small-model-new, is a 7 billion parameter Llama 2 variant that has been fine-tuned on a small, experimental dataset. The primary goal of this project was to assess the performance implications of training longer on a more constrained dataset.
Key Capabilities
The model is designed to handle a variety of natural language tasks, including:
- Creative Writing: Generating open-ended, creative responses based on specific instructions.
- Closed QA: Providing factually correct answers from a given passage of text.
- Open QA: Answering questions using general world knowledge or requiring minimal external search.
- Summarization: Condensing paragraphs from source texts like Wikipedia.
- Information Extraction: Identifying and extracting specific details from provided passages.
- Classification: Categorizing entities based on given lists or examples.
- Brainstorming: Generating multiple ideas in response to a prompt.
Performance and Evaluation
Evaluations were conducted using the EleutherAI lm-evaluation-harness on the HellaSwag task, with results indicating a score of 72.35. Further evaluation on the Open LLM Leaderboard shows an average score of 46.62 across various benchmarks, including:
- AI2 Reasoning Challenge (25-Shot): 45.22
- MMLU (5-Shot): 46.23
- TruthfulQA (0-shot): 42.46
- Winogrande (5-shot): 63.93
- GSM8k (5-shot): 9.55
Training Details
The model was fine-tuned using torch, transformers, peft, bitsandbytes, and trl libraries. The training involved 1000 carefully selected samples for each category, run for 50 epochs with a batch size of 2. The training consumed 0.432 kg/kWh of carbon over 28 hours on a6000 48GB GPUs.