tiiuae/falcon-7b
Falcon-7B is a 7 billion parameter causal decoder-only language model developed by TII, trained on 1,500 billion tokens of RefinedWeb data enhanced with curated corpora. It features an architecture optimized for inference, incorporating FlashAttention and multiquery mechanisms. This pretrained model is designed to outperform comparable open-source models and is suitable as a foundation for further specialization and fine-tuning for various NLP tasks.
Loading preview...
Falcon-7B: An Optimized 7B Parameter LLM
Falcon-7B is a 7 billion parameter causal decoder-only model developed by TII (Technology Innovation Institute). It was trained on an extensive dataset of 1,500 billion tokens, primarily from the RefinedWeb corpus, augmented with curated data. This model is released under the permissive Apache 2.0 license, allowing for commercial use.
Key Capabilities and Features
- Performance: Falcon-7B is noted for outperforming other open-source models in its class, such as MPT-7B and StableLM, as evidenced by its standing on the OpenLLM Leaderboard.
- Inference Optimization: The model's architecture is specifically designed for efficient inference, utilizing advanced techniques like FlashAttention and multiquery attention.
- Training Data: Its training involved a massive 1.5 trillion token dataset, with 79% from RefinedWeb-English, and significant contributions from books, conversations (Reddit, StackOverflow), code, RefinedWeb-French, and technical papers.
- Multilingual Support: While primarily English and French, it has limited capabilities in other languages including German, Spanish, Italian, Portuguese, Polish, Dutch, Romanian, Czech, and Swedish.
Use Cases and Recommendations
Falcon-7B is a raw, pretrained model intended as a robust foundation for research and further specialization. It is highly recommended for:
- Foundation Model: Serving as a base for fine-tuning for specific applications like summarization, text generation, or chatbot development.
- Research: Exploring large language model capabilities and architectural optimizations.
Users should be aware that this model carries biases inherent in its web-scale training data and requires further fine-tuning and guardrails for production environments. For instruction-following tasks, the fine-tuned Falcon-7B-Instruct is recommended.