budecosystem/boomer-1b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Oct 3, 2023License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

budecosystem/boomer-1b is a 1.1 billion parameter language model developed by BudEcosystem, pretrained from scratch on a custom-curated dataset of 41 billion tokens. It features a custom architecture incorporating flash attention and a higher intermediate MLP dimension, designed for efficient language modeling. The model is suitable for retrieval augmentation, inference at the edge, and general language modeling tasks.

Loading preview...

Overview

budecosystem/boomer-1b is a 1.1 billion parameter language model developed by BudEcosystem, pretrained from scratch on a custom-curated dataset of 41 billion tokens. This model incorporates a custom architecture with flash attention and an increased intermediate MLP layer dimension. The training dataset is a diverse combination of wiki, stories, arxiv, math, and code.

Key Capabilities

  • Custom Architecture: Features flash attention and a higher intermediate MLP dimension for potentially improved efficiency and performance.
  • Pretrained from Scratch: Developed using a unique 41 billion token dataset, allowing for distinct characteristics.
  • Fine-tuning Support: Provides scripts for easy fine-tuning on custom datasets using finetune.py.
  • Inference Generation: Includes a generate.py script for straightforward text generation from the Hugging Face model hub.

Performance Benchmarks

Evaluations on several benchmarks show its initial performance:

  • ARC: 22.35
  • MMLU: 25.92
  • Human Eval: 6.1
  • Hellaswag: 31.66
  • BBH: 28.65
  • DROP: 6.13
  • GSM8K: 1.5

Good For

  • Retrieval Augmentation: Can be integrated into systems requiring augmented information retrieval.
  • Inference at the Edge: Its smaller parameter count makes it suitable for deployment in resource-constrained environments.
  • Language Modeling Use Cases: Applicable for various general language understanding and generation tasks.