allenai/OLMo-1B-0724-hf

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Jun 15, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

OLMo-1B-0724-hf is a 1 billion parameter Transformer-style autoregressive language model developed by the Allen Institute for AI (AI2). This model is an improved version of the original OLMo 1B, featuring enhanced performance on benchmarks like HellaSwag due to an updated Dolma dataset and staged training. Designed to advance the science of language models, it offers a transparent and reproducible foundation for research and development.

Loading preview...

Overview

OLMo-1B-0724-hf is a 1 billion parameter open language model from the Allen Institute for AI (AI2), designed to foster scientific research in language models. This July 2024 release is an updated version of the original OLMo 1B, demonstrating a 4.4 point increase in HellaSwag scores and other evaluation improvements. It was trained on an enhanced version of the Dolma dataset (v1.7) and utilizes a two-stage training curriculum, contributing to its improved performance.

Key Capabilities

  • Improved Performance: Shows notable gains on benchmarks like HellaSwag compared to its predecessor, with an average score of 65.0 across various tasks.
  • Transparent Development: Released with all code, checkpoints, logs, and training details to enable reproducibility and scientific study.
  • Staged Training: Benefits from a two-stage training process, initially on the full Dolma 1.7 dataset, followed by an annealing phase on a higher-quality subset.
  • Hugging Face Integration: Directly compatible with Hugging Face Transformers from v4.40 onwards, supporting easy inference and fine-tuning.

Good For

  • Language Model Research: Ideal for researchers studying language model behavior, training methodologies, and scaling laws due to its open and transparent nature.
  • Fine-tuning: Provides multiple intermediate checkpoints for flexible fine-tuning on specific downstream tasks.
  • English NLP Tasks: Optimized for general English natural language processing applications.