budecosystem/boomer-1b
budecosystem/boomer-1b is a 1.1 billion parameter language model developed by BudEcosystem, pretrained from scratch on a custom-curated dataset of 41 billion tokens. It features a custom architecture incorporating flash attention and a higher intermediate MLP dimension, designed for efficient language modeling. The model is suitable for retrieval augmentation, inference at the edge, and general language modeling tasks.
Loading preview...
Overview
budecosystem/boomer-1b is a 1.1 billion parameter language model developed by BudEcosystem, pretrained from scratch on a custom-curated dataset of 41 billion tokens. This model incorporates a custom architecture with flash attention and an increased intermediate MLP layer dimension. The training dataset is a diverse combination of wiki, stories, arxiv, math, and code.
Key Capabilities
- Custom Architecture: Features flash attention and a higher intermediate MLP dimension for potentially improved efficiency and performance.
- Pretrained from Scratch: Developed using a unique 41 billion token dataset, allowing for distinct characteristics.
- Fine-tuning Support: Provides scripts for easy fine-tuning on custom datasets using
finetune.py. - Inference Generation: Includes a
generate.pyscript for straightforward text generation from the Hugging Face model hub.
Performance Benchmarks
Evaluations on several benchmarks show its initial performance:
- ARC: 22.35
- MMLU: 25.92
- Human Eval: 6.1
- Hellaswag: 31.66
- BBH: 28.65
- DROP: 6.13
- GSM8K: 1.5
Good For
- Retrieval Augmentation: Can be integrated into systems requiring augmented information retrieval.
- Inference at the Edge: Its smaller parameter count makes it suitable for deployment in resource-constrained environments.
- Language Modeling Use Cases: Applicable for various general language understanding and generation tasks.