tiiuae/falcon-rw-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7.8BQuant:FP8Ctx Length:32kPublished:Apr 26, 2023License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Falcon-RW-7B is a 7.8 billion parameter causal decoder-only language model developed by TII, trained exclusively on 350 billion tokens of the RefinedWeb dataset. This model is specifically designed as a research artifact to study the impact of training solely on high-quality web data, demonstrating performance comparable to or exceeding models trained on curated datasets. It features an architecture adapted from GPT-3, incorporating ALiBi and FlashAttention, and is primarily intended for research into web data's influence on LLM properties.

Loading preview...

Falcon-RW-7B: A Research Model for Web Data Influence

Falcon-RW-7B is a 7.8 billion parameter causal decoder-only language model developed by TII. Its primary distinction lies in its training data: 350 billion tokens exclusively from RefinedWeb, a high-quality, filtered, and deduplicated web dataset. This model serves as a research artifact to investigate how training solely on web data impacts large language model properties, such as fairness, safety, and capabilities.

Key Characteristics:

  • Training Data Focus: Trained entirely on RefinedWeb, demonstrating that web-only data can yield performance matching or surpassing models trained on curated datasets.
  • Architecture: Adapts the GPT-3 architecture, enhanced with ALiBi for improved context handling and FlashAttention for efficiency.
  • Language: English-only model.
  • License: Available under the Apache 2.0 license.

Intended Use:

  • Direct Use: Primarily for research into the influence of web data on LLMs.
  • Out-of-Scope Use: Not recommended for production use without thorough risk assessment and mitigation. For general-purpose, state-of-the-art applications, TII recommends using Falcon-7B or Falcon-40B, which were trained on significantly larger and more diverse datasets.

Limitations:

  • As an English-only model, it will not generalize to other languages.
  • Inherits stereotypes and biases present in large-scale web corpora.

Users are encouraged to fine-tune the model for specific tasks and implement guardrails for any potential production deployment.