pankajmathur/orca_mini_13b
pankajmathur/orca_mini_13b is a 13 billion parameter OpenLLaMa-based model developed by Pankaj Mathur. It is fine-tuned on explain-tuned datasets derived from WizardLM, Alpaca, and Dolly-V2, utilizing Orca Research Paper's dataset construction approaches. This model is designed to learn the 'thought process' from a teacher model (ChatGPT), making it suitable for tasks requiring detailed explanations and instruction following. It features a context length of 4096 tokens.
Loading preview...
Model Overview
pankajmathur/orca_mini_13b is a 13 billion parameter language model built upon the OpenLLaMa-13B architecture. Developed by Pankaj Mathur, this model distinguishes itself through its unique training methodology, which incorporates dataset construction approaches from the Orca Research Paper.
Key Capabilities & Training
The model was fine-tuned on a combination of explain-tuned datasets, specifically drawing from WizardLM (70K examples), Alpaca (52K examples), and Dolly-V2 (~15K examples). A core aspect of its training involved leveraging all 15 system instructions from the Orca Research Paper to generate custom datasets. This approach aims to enable the model to learn the 'thought process' from a teacher model, specifically ChatGPT (gpt-3.5-turbo-0301 version), by adding a system prompt before each instruction.
Training was conducted on 8x A100 (80G) GPUs for approximately 15 hours, utilizing DeepSpeed with ZeRO stage 3 for fully sharded data parallelism. Key training parameters included a batch size of 16, a learning rate of 2e-5, and a max length of 1024 tokens over 3 epochs.
Performance & Limitations
On the Open LLM Leaderboard, the model achieved an average score of 37.06, with specific scores like 42.06 on ARC (25-shot) and 63.4 on HellaSwag (10-shot). It's important to note that the model can produce factually incorrect output and may generate biased or offensive content due to its training data. Users should exercise caution and not rely on it for factually accurate information.
Good for
- Instruction-following tasks where the model needs to understand and respond based on a system prompt.
- Generating detailed explanations by mimicking the 'thought process' learned from a teacher model.
- Experimentation with Orca-style fine-tuning on an OpenLLaMa base.