cretone/ultron_storm_sft_20231210

TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:2kArchitecture:Transformer Gated Cold

Ultron_storm_sft_20231210 is a 1.1 billion parameter large language model from the Ultron series, developed by cretone. It utilizes Grouped Query Attention and has a sequence length of 2048 tokens. Trained on a substantial 950 billion token dataset, this model is part of a larger family of LLMs ranging from 160M to 1.1B parameters.

Loading preview...

Ultron_storm_sft_20231210 Overview

This model, developed by cretone, is part of the Ultron series of large language models, which encompasses models from 160 million to 1.1 billion parameters. Ultron_storm_sft_20231210 specifically features 1.1 billion parameters.

Key Technical Specifications

  • Architecture: Employs Grouped Query Attention for efficient processing.
  • Context Length: Supports a sequence length of 2048 tokens.
  • Training Data: Trained on a significant dataset size of 950 billion tokens.
  • Learning Rate: The model was trained with a learning rate of 4e-4.

Important Note

It is important to note that this particular model is described as a placeholder and does not represent the final lineup of the Ultron series. Developers should consider this when evaluating its long-term applicability or expected performance within the broader Ultron family.