tiiuae/falcon-rw-7b
Falcon-RW-7B is a 7.8 billion parameter causal decoder-only language model developed by TII, trained exclusively on 350 billion tokens of the RefinedWeb dataset. This model is specifically designed as a research artifact to study the impact of training solely on high-quality web data, demonstrating performance comparable to or exceeding models trained on curated datasets. It features an architecture adapted from GPT-3, incorporating ALiBi and FlashAttention, and is primarily intended for research into web data's influence on LLM properties.
Loading preview...
Falcon-RW-7B: A Research Model for Web Data Influence
Falcon-RW-7B is a 7.8 billion parameter causal decoder-only language model developed by TII. Its primary distinction lies in its training data: 350 billion tokens exclusively from RefinedWeb, a high-quality, filtered, and deduplicated web dataset. This model serves as a research artifact to investigate how training solely on web data impacts large language model properties, such as fairness, safety, and capabilities.
Key Characteristics:
- Training Data Focus: Trained entirely on RefinedWeb, demonstrating that web-only data can yield performance matching or surpassing models trained on curated datasets.
- Architecture: Adapts the GPT-3 architecture, enhanced with ALiBi for improved context handling and FlashAttention for efficiency.
- Language: English-only model.
- License: Available under the Apache 2.0 license.
Intended Use:
- Direct Use: Primarily for research into the influence of web data on LLMs.
- Out-of-Scope Use: Not recommended for production use without thorough risk assessment and mitigation. For general-purpose, state-of-the-art applications, TII recommends using Falcon-7B or Falcon-40B, which were trained on significantly larger and more diverse datasets.
Limitations:
- As an English-only model, it will not generalize to other languages.
- Inherits stereotypes and biases present in large-scale web corpora.
Users are encouraged to fine-tune the model for specific tasks and implement guardrails for any potential production deployment.