ibm-research/merlinite-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 2, 2024License:apache-2.0Architecture:Transformer0.1K Open Weights Cold

Merlinite-7b is a 7 billion parameter language model developed by IBM Research, based on the Mistral-7B-v0.1 architecture. It is fine-tuned using the Large-scale Alignment for chatBots (LAB) method, which employs a taxonomy-driven synthetic data generation process with Mixtral-8x7B-Instruct as a teacher model. This approach allows for incremental knowledge and skill acquisition without catastrophic forgetting, making it suitable for diverse conversational AI applications.

Loading preview...

Merlinite-7b: A LAB-Aligned Mistral Model

Merlinite-7b is a 7 billion parameter language model from IBM Research, built upon the Mistral-7B-v0.1 base. Its key differentiator is the Large-scale Alignment for chatBots (LAB) methodology, a novel synthetic data-based alignment tuning approach. LAB utilizes Mixtral-8x7B-Instruct as a teacher model and focuses on a taxonomy-driven data curation process, a large-scale synthetic data generator, and a two-phased training with replay buffers.

Key Capabilities & Differentiators

  • LAB Methodology: Enables incremental addition of new knowledge and skills to a pre-trained model without catastrophic forgetting.
  • Taxonomy-driven Data Generation: Unlike uniform sampling, LAB uses a taxonomy of seed examples to drive the sampling process, ensuring diverse task coverage and efficient teacher model exploitation.
  • Competitive Performance: Benchmarks show Merlinite-7b achieving strong results, including 7.66 on MTBench (Avg) and 64.88 on MMLU, performing competitively with larger models like Orca-2-13b and WizardLM-13B-V1.2.
  • Two-Phased Training: Involves distinct knowledge tuning (simple then complex) and skills tuning phases, incorporating replay buffers to optimize learning.

When to Use This Model

Merlinite-7b is well-suited for applications requiring a capable 7B parameter model that benefits from a structured, synthetic data-driven alignment. Its LAB methodology makes it particularly interesting for scenarios where incremental skill and knowledge acquisition are important, and for developing conversational AI agents that require robust performance across diverse tasks.