cretone/ultron_storm_sft_20231210
Ultron_storm_sft_20231210 is a 1.1 billion parameter large language model from the Ultron series, developed by cretone. It utilizes Grouped Query Attention and has a sequence length of 2048 tokens. Trained on a substantial 950 billion token dataset, this model is part of a larger family of LLMs ranging from 160M to 1.1B parameters.
Loading preview...
Ultron_storm_sft_20231210 Overview
This model, developed by cretone, is part of the Ultron series of large language models, which encompasses models from 160 million to 1.1 billion parameters. Ultron_storm_sft_20231210 specifically features 1.1 billion parameters.
Key Technical Specifications
- Architecture: Employs Grouped Query Attention for efficient processing.
- Context Length: Supports a sequence length of 2048 tokens.
- Training Data: Trained on a significant dataset size of 950 billion tokens.
- Learning Rate: The model was trained with a learning rate of 4e-4.
Important Note
It is important to note that this particular model is described as a placeholder and does not represent the final lineup of the Ultron series. Developers should consider this when evaluating its long-term applicability or expected performance within the broader Ultron family.