stabilityai/stablelm-3b-4e1t

TEXT GENERATIONConcurrency Cost:1Model Size:2.2BQuant:BF16Ctx Length:32kPublished:Sep 29, 2023License:cc-by-sa-4.0Architecture:Transformer0.3K Open Weights Cold

StableLM-3B-4E1T is a 2.8 billion parameter decoder-only language model developed by Stability AI. It was pre-trained on 1 trillion tokens of diverse English and code datasets over four epochs. This model is designed as a foundational base model, optimized for application-specific fine-tuning, and is particularly suitable for tasks requiring a compact yet capable English and code-aware model.

Loading preview...

Overview

StableLM-3B-4E1T is a 2.8 billion parameter decoder-only transformer model developed by Stability AI. It was pre-trained on 1 trillion tokens across diverse English and code datasets for four epochs, making it a robust base for various applications. The model utilizes a modified LLaMA-like architecture, incorporating Rotary Position Embeddings and LayerNorm for improved performance.

Key Capabilities

  • Foundational Model: Designed as a base model for further fine-tuning on specific downstream tasks.
  • Efficient Architecture: Features a decoder-only transformer architecture with 32 layers and 2560 hidden size, optimized for throughput with Rotary Position Embeddings.
  • Extensive Pre-training: Trained on a large corpus including Falcon RefinedWeb, RedPajama-Data, The Pile, and StarCoder datasets.
  • English and Code Proficiency: Pre-trained on both English and code datasets, indicating capabilities in both domains.

Good For

  • Application-Specific Fine-tuning: Ideal for developers looking to fine-tune a compact model for specialized tasks.
  • Resource-Constrained Environments: Its 2.8 billion parameters make it suitable for deployment where larger models might be impractical.
  • Research and Development: Provides a strong base for exploring language model capabilities and architectural modifications.

Limitations

As a base model, StableLM-3B-4E1T may exhibit undesirable behaviors that require evaluation and fine-tuning for safe deployment. Users should exercise caution and thoroughly test the model for their specific use cases.