stabilityai/stablelm-3b-4e1t
StableLM-3B-4E1T is a 2.8 billion parameter decoder-only language model developed by Stability AI. It was pre-trained on 1 trillion tokens of diverse English and code datasets over four epochs. This model is designed as a foundational base model, optimized for application-specific fine-tuning, and is particularly suitable for tasks requiring a compact yet capable English and code-aware model.
Loading preview...
Overview
StableLM-3B-4E1T is a 2.8 billion parameter decoder-only transformer model developed by Stability AI. It was pre-trained on 1 trillion tokens across diverse English and code datasets for four epochs, making it a robust base for various applications. The model utilizes a modified LLaMA-like architecture, incorporating Rotary Position Embeddings and LayerNorm for improved performance.
Key Capabilities
- Foundational Model: Designed as a base model for further fine-tuning on specific downstream tasks.
- Efficient Architecture: Features a decoder-only transformer architecture with 32 layers and 2560 hidden size, optimized for throughput with Rotary Position Embeddings.
- Extensive Pre-training: Trained on a large corpus including Falcon RefinedWeb, RedPajama-Data, The Pile, and StarCoder datasets.
- English and Code Proficiency: Pre-trained on both English and code datasets, indicating capabilities in both domains.
Good For
- Application-Specific Fine-tuning: Ideal for developers looking to fine-tune a compact model for specialized tasks.
- Resource-Constrained Environments: Its 2.8 billion parameters make it suitable for deployment where larger models might be impractical.
- Research and Development: Provides a strong base for exploring language model capabilities and architectural modifications.
Limitations
As a base model, StableLM-3B-4E1T may exhibit undesirable behaviors that require evaluation and fine-tuning for safe deployment. Users should exercise caution and thoroughly test the model for their specific use cases.