vilsonrodrigues/falcon-7b-sharded

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:32kPublished:Jun 16, 2023License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The vilsonrodrigues/falcon-7b-sharded model is a 7 billion parameter causal decoder-only language model developed by TII, based on the Falcon-7B architecture. It was trained on 1,500 billion tokens of RefinedWeb and curated corpora, featuring an architecture optimized for inference with FlashAttention and multiquery. This sharded version is specifically designed for low RAM environments, making it suitable for research and as a foundation for further fine-tuning.

Loading preview...

Overview

This model, vilsonrodrigues/falcon-7b-sharded, is a re-sharded version of the original Falcon-7B model by TII, optimized for environments with limited RAM, such as Colab or Kaggle. Falcon-7B is a 7 billion parameter causal decoder-only model trained on 1,500 billion tokens, primarily from the RefinedWeb dataset enhanced with curated corpora. It is released under the permissive Apache 2.0 license, allowing for commercial use.

Key Capabilities & Features

  • Optimized Architecture: Incorporates FlashAttention and multiquery mechanisms for efficient inference.
  • Strong Performance: Outperforms comparable open-source models in its size class, as indicated by the OpenLLM Leaderboard.
  • Extensive Training Data: Trained on a massive 1.5 trillion tokens, including a significant portion of high-quality web data and diverse curated sources like books, conversations, and code.
  • Low RAM Compatibility: The sharded format in safetensors makes it accessible for deployment in memory-constrained settings.

Intended Use Cases

  • Research: Ideal for academic and experimental research on large language models.
  • Foundation Model: Serves as a robust base for further specialization and fine-tuning for specific applications like summarization, text generation, or chatbot development.
  • Low-Resource Environments: Particularly useful for developers working in environments with limited computational resources, such as cloud-based notebooks.