allenai/OLMo-1B-hf

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Apr 12, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

OLMo-1B-hf is a 1 billion parameter autoregressive Transformer language model developed by the Allen Institute for AI (AI2). Part of the Open Language Models (OLMo) series, it is trained on the Dolma dataset with a 2048 token context length. This model is designed to advance the science of language models by providing fully open access to training code, checkpoints, and logs. It offers a balance of performance and efficiency for research and development in natural language processing.

Loading preview...

OLMo-1B-hf: An Open Language Model for Scientific Advancement

OLMo-1B-hf is a 1 billion parameter autoregressive Transformer language model developed by the Allen Institute for AI (AI2). It is part of the broader OLMo (Open Language Models) series, which emphasizes transparency and reproducibility in language model research. The model is trained on the extensive Dolma dataset and features a 2048 token context length.

Key Capabilities & Features

  • Fully Open: OLMo-1B-hf is released with all training code, checkpoints, and logs, enabling researchers to deeply understand and build upon its development.
  • Transformer Architecture: Utilizes a standard Transformer architecture with 16 layers, a 2048 hidden size, and 16 attention heads.
  • Performance: Achieves competitive results among 1B-parameter models on various benchmarks, including an average of 62.42 on core tasks, outperforming Pythia 1B and TinyLlama 1.1B.
  • Hugging Face Compatibility: Provided in a Hugging Face Transformers format for easy integration and use.

Good For

  • Language Model Research: Ideal for researchers studying language model behavior, training dynamics, and architectural variations due to its complete transparency.
  • Fine-tuning: Serves as a strong base model for fine-tuning on specific downstream tasks, with intermediate checkpoints available for more granular control.
  • Resource-Efficient Applications: Its 1 billion parameter size makes it suitable for applications where computational resources are a consideration, offering a balance between performance and efficiency.