recursal/Finch-MoE-37B-A11B-v0.1-HF

Loading
Public
37B
FP8
16384
1
Nov 5, 2024
License: apache-2.0
Hugging Face
Overview

Finch-MoE-37B-A11B-v0.1-HF: A Mixture of Experts RWKV Model

The recursal/Finch-MoE-37B-A11B-v0.1-HF is a Hugging Face compatible implementation of the Flock of Finches Mixture of Experts (MoE) model, developed by Recursal with compute sponsored by TensorWave. This model leverages a 37 billion parameter architecture with an active parameter count of 11 billion, aiming for a balance of performance and efficiency.

Key Capabilities

  • Mixture of Experts Architecture: Utilizes an MoE design, which can offer improved performance and potentially more efficient inference compared to dense models of similar total parameter count.
  • Improved General Language Understanding: Demonstrates notable gains across various benchmarks compared to its predecessors, Eagle 7B, Finch 7B, and Finch 14B.
  • Hugging Face Compatibility: Fully compatible with the Hugging Face transformers library for straightforward integration and deployment.
  • Multilingual Support: Example usage shows generation in Chinese, indicating potential for multilingual applications.

Good for

  • General-purpose text generation: Capable of generating detailed responses to various prompts, as shown in the examples.
  • Research and experimentation with MoE models: Provides a readily available MoE model within the RWKV family for developers to explore.
  • Applications requiring enhanced reasoning: Shows improvements on benchmarks like ARC-C and Winogrande, suggesting stronger reasoning capabilities.

Performance Highlights

Evaluations show the Flock of Finches 37B-A11B v0.1 model outperforming earlier RWKV models:

  • ARC C: 48.04% (vs. 39.59% for Eagle 7B)
  • MMLU: 55.58% (vs. 30.86% for Eagle 7B)
  • Winogrande: 75.14% (vs. 67.56% for Eagle 7B)

This model is a significant step in the RWKV series, offering a powerful MoE option for a range of language tasks.