Overview
Finch-MoE-37B-A11B-v0.1-HF: A Mixture of Experts RWKV Model
The recursal/Finch-MoE-37B-A11B-v0.1-HF is a Hugging Face compatible implementation of the Flock of Finches Mixture of Experts (MoE) model, developed by Recursal with compute sponsored by TensorWave. This model leverages a 37 billion parameter architecture with an active parameter count of 11 billion, aiming for a balance of performance and efficiency.
Key Capabilities
- Mixture of Experts Architecture: Utilizes an MoE design, which can offer improved performance and potentially more efficient inference compared to dense models of similar total parameter count.
- Improved General Language Understanding: Demonstrates notable gains across various benchmarks compared to its predecessors, Eagle 7B, Finch 7B, and Finch 14B.
- Hugging Face Compatibility: Fully compatible with the Hugging Face
transformerslibrary for straightforward integration and deployment. - Multilingual Support: Example usage shows generation in Chinese, indicating potential for multilingual applications.
Good for
- General-purpose text generation: Capable of generating detailed responses to various prompts, as shown in the examples.
- Research and experimentation with MoE models: Provides a readily available MoE model within the RWKV family for developers to explore.
- Applications requiring enhanced reasoning: Shows improvements on benchmarks like ARC-C and Winogrande, suggesting stronger reasoning capabilities.
Performance Highlights
Evaluations show the Flock of Finches 37B-A11B v0.1 model outperforming earlier RWKV models:
- ARC C: 48.04% (vs. 39.59% for Eagle 7B)
- MMLU: 55.58% (vs. 30.86% for Eagle 7B)
- Winogrande: 75.14% (vs. 67.56% for Eagle 7B)
This model is a significant step in the RWKV series, offering a powerful MoE option for a range of language tasks.