Name: pankajmathur/orca_mini_7b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: pankajmathur

Model Overview

pankajmathur/orca_mini_7b is a 7 billion parameter model built upon the OpenLLaMa-7B architecture. Developed by Pankaj Mathur, this model distinguishes itself through its unique training methodology, which focuses on explain-tuned datasets.

Key Capabilities & Training

The model was trained using a combination of WizardLM, Alpaca, and Dolly-V2 datasets. Crucially, it incorporates dataset construction approaches from the Orca Research Paper, leveraging all 15 system instructions provided in the paper. This method aims to teach the model the 'thought process' of a teacher model, specifically ChatGPT (gpt-3.5-turbo-0301), by integrating system prompts before each instruction.

Training was conducted on 8x A100 (80G) GPUs for approximately 7 hours, utilizing DeepSpeed with ZeRO stage 3 for efficient data parallelism. Key training parameters included a batch size of 32, a learning rate of 2e-5, and a maximum sequence length of 1024 tokens over 3 epochs.

Performance & Limitations

Evaluations on the Open LLM Leaderboard show an average score of 37.39 (or 41.27 in another entry), with specific scores like ARC (25-shot) at 43.94 and HellaSwag (10-shot) at 65.22. It's important to note that the model may produce factually incorrect outputs and should not be relied upon for factual accuracy. It also carries the potential for biased or offensive outputs due to its training on public datasets.

Overview

Model Overview

Key Capabilities & Training

Performance & Limitations

Full Model Card (README)