pankajmathur/orca_mini_v2_7b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jul 3, 2023License:cc-by-nc-sa-4.0Architecture:Transformer0.0K Open Weights Cold

pankajmathur/orca_mini_v2_7b is a 7 billion parameter LLaMA-based instruction-tuned language model developed by Pankaj Mathur in collaboration with Eric Hartford. Fine-tuned on uncensored WizardLM, Alpaca, and Dolly-V2 datasets using Orca Research Paper's dataset construction approach, it demonstrates improved code generation capabilities compared to its predecessor. This model is designed to learn thought processes from a teacher model (ChatGPT) through system prompts, making it suitable for tasks requiring detailed explanations and instruction following.

Loading preview...

Model Overview

pankajmathur/orca_mini_v2_7b is a 7 billion parameter LLaMA-based model, developed by Pankaj Mathur in collaboration with Eric Hartford. It is an instruction-tuned model, leveraging an uncensored script on top of explain-tuned datasets including WizardLM (70K), Alpaca (52K), and Dolly-V2 (~15K). A key differentiator is its use of dataset construction approaches from the Orca Research Paper, incorporating 15 system instructions to help the model learn 'thought' processes from a teacher model (ChatGPT).

Key Capabilities & Features

  • Enhanced Code Generation: This version offers better code generation capabilities compared to the original orca_mini_7b.
  • Instruction Following: Trained to follow complex instructions by learning from explanation traces, similar to the Orca methodology.
  • Uncensored Training: Utilizes uncensored datasets for broader response generation.

Performance Highlights

Evaluated using the EleutherAI Language Model Evaluation Harness, the model achieved an average score of 52.62 across tasks like arc_challenge, hellaswag, mmlu, and truthfulqa_mc. On the HuggingFaceH4 Open LLM Leaderboard, it shows an average of 47.41, with specific scores including 50.77 for ARC and 76.02 for HellaSwag.

Training Details

The model was trained on 8x A100 (80G) GPUs for approximately 13 hours, utilizing DeepSpeed with ZeRO stage 3 for fully sharded data parallelism. Key training parameters included a batch size of 96, a learning rate of 2e-5, and a maximum sequence length of 1024 over 3 epochs.

Limitations

Users should be aware that the model can produce factually incorrect outputs and may generate biased or offensive content due to its training on diverse public datasets.