Overview

StableLM-3B-4E1T is a 2.8 billion parameter decoder-only transformer model developed by Stability AI. It was pre-trained on 1 trillion tokens across diverse English and code datasets for four epochs, making it a robust base for various applications. The model utilizes a modified LLaMA-like architecture, incorporating Rotary Position Embeddings and LayerNorm for improved performance.

Key Capabilities

Foundational Model: Designed as a base model for further fine-tuning on specific downstream tasks.
Efficient Architecture: Features a decoder-only transformer architecture with 32 layers and 2560 hidden size, optimized for throughput with Rotary Position Embeddings.
Extensive Pre-training: Trained on a large corpus including Falcon RefinedWeb, RedPajama-Data, The Pile, and StarCoder datasets.
English and Code Proficiency: Pre-trained on both English and code datasets, indicating capabilities in both domains.

Good For

Application-Specific Fine-tuning: Ideal for developers looking to fine-tune a compact model for specialized tasks.
Resource-Constrained Environments: Its 2.8 billion parameters make it suitable for deployment where larger models might be impractical.
Research and Development: Provides a strong base for exploring language model capabilities and architectural modifications.

Limitations

As a base model, StableLM-3B-4E1T may exhibit undesirable behaviors that require evaluation and fine-tuning for safe deployment. Users should exercise caution and thoroughly test the model for their specific use cases.

Overview

Overview

Key Capabilities

Good For

Limitations

Full Model Card (README)