pankajmathur/orca_mini_v7_72b

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:72.7BQuant:FP8Ctx Length:32kPublished:Jun 26, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The pankajmathur/orca_mini_v7_72b is a 72.7 billion parameter Qwen2-based instruction-tuned causal language model developed by pankajmathur. This model is trained with various SFT (Supervised Fine-Tuning) datasets, making it a comprehensive general model designed for further customization and enhancement. It features a 131072 token context length, leveraging YARN for efficient processing of long texts, and is suitable as a foundational base for advanced fine-tuning or merging operations.

Loading preview...

Model Overview

pankajmathur/orca_mini_v7_72b is a 72.7 billion parameter model built on the Qwen2 architecture, developed by pankajmathur. It has been extensively trained using various Supervised Fine-Tuning (SFT) datasets, positioning it as a robust general-purpose model. A key feature is its support for a 131072 token context length, achieved through the integration of YARN (Yet Another RoPE extendeN) for efficient handling of long texts.

Key Capabilities

  • General-Purpose Foundation: Designed as a comprehensive base model suitable for a wide array of NLP tasks.
  • Long Context Handling: Utilizes YARN to process inputs up to 131072 tokens, making it effective for applications requiring extensive textual understanding.
  • Customization Ready: Explicitly intended for further fine-tuning (Full fine-tuning, DPO, PPO, ORPO) and model merging, encouraging developers to adapt it to specific needs.
  • Instruction-Tuned: Benefits from SFT datasets, enhancing its ability to follow instructions and generate relevant responses.

Performance Highlights

Evaluations on the Open LLM Leaderboard show an average score of 39.06. Specific metrics include 59.30 on IFEval (0-Shot), 55.06 on BBH (3-Shot), and 51.35 on MMLU-PRO (5-shot).

Recommended Usage

This model is ideal for developers looking for a powerful, adaptable base model for advanced fine-tuning or integration into larger systems. Its long-context capabilities make it particularly useful for applications involving detailed document analysis or extended conversational AI. The README provides example usage with transformers and deployment instructions using vLLM for optimal performance, especially with long contexts.