Name: pankajmathur/orca_mini_13b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: pankajmathur

Model Overview

pankajmathur/orca_mini_13b is a 13 billion parameter language model built upon the OpenLLaMa-13B architecture. Developed by Pankaj Mathur, this model distinguishes itself through its unique training methodology, which incorporates dataset construction approaches from the Orca Research Paper.

Key Capabilities & Training

The model was fine-tuned on a combination of explain-tuned datasets, specifically drawing from WizardLM (70K examples), Alpaca (52K examples), and Dolly-V2 (~15K examples). A core aspect of its training involved leveraging all 15 system instructions from the Orca Research Paper to generate custom datasets. This approach aims to enable the model to learn the 'thought process' from a teacher model, specifically ChatGPT (gpt-3.5-turbo-0301 version), by adding a system prompt before each instruction.

Training was conducted on 8x A100 (80G) GPUs for approximately 15 hours, utilizing DeepSpeed with ZeRO stage 3 for fully sharded data parallelism. Key training parameters included a batch size of 16, a learning rate of 2e-5, and a max length of 1024 tokens over 3 epochs.

Performance & Limitations

On the Open LLM Leaderboard, the model achieved an average score of 37.06, with specific scores like 42.06 on ARC (25-shot) and 63.4 on HellaSwag (10-shot). It's important to note that the model can produce factually incorrect output and may generate biased or offensive content due to its training data. Users should exercise caution and not rely on it for factually accurate information.

Good for

Instruction-following tasks where the model needs to understand and respond based on a system prompt.
Generating detailed explanations by mimicking the 'thought process' learned from a teacher model.
Experimentation with Orca-style fine-tuning on an OpenLLaMa base.

Overview

Model Overview

Key Capabilities & Training

Performance & Limitations

Good for

Full Model Card (README)