amd/AMD-OLMo-1B

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Oct 31, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

AMD-OLMo-1B is a 1.2 billion parameter language model developed by AMD, based on the OLMo architecture and trained from scratch on AMD Instinct MI250 GPUs. Pre-trained on a 1.3 trillion token subset of Dolma v1.7, this model offers competitive performance among 1B-class models, particularly in commonsense reasoning and instruction-following after fine-tuning. It is designed for research purposes, providing a foundation for further development and evaluation on AMD hardware.

Loading preview...

AMD-OLMo-1B Overview

AMD-OLMo-1B is a 1.2 billion parameter language model developed by AMD, leveraging the OLMo architecture and trained entirely on AMD Instinct MI250 GPUs. This model is part of a series that includes supervised fine-tuned (SFT) and Direct Preference Optimization (DPO) aligned versions, demonstrating AMD's commitment to open-source AI development on their hardware.

Key Capabilities

  • Foundation Model: The base AMD-OLMo-1B is pre-trained on a massive 1.3 trillion token subset of the Dolma v1.7 dataset.
  • Instruction Following: The SFT variant (AMD-OLMo-1B-SFT) is fine-tuned on diverse datasets like Tulu V2, OpenHermes-2.5, WebInstructSub, and Code-Feedback, enhancing its ability to follow instructions.
  • Human Alignment: The DPO-aligned version (AMD-OLMo-1B-SFT-DPO) is optimized for human preferences using the UltraFeedback dataset, improving conversational quality.
  • Competitive Performance: Benchmarks show AMD-OLMo-1B models achieving strong results in their class, particularly in areas like arc_easy, sciq, and mmlu for instruction-tuned variants.
  • Hardware Optimized: Specifically trained and optimized for AMD Instinct™ MI250 GPUs, showcasing efficient training throughput of 12,200 tokens/sec/GPU.

Good For

  • Research and Development: Ideal for researchers and developers exploring language models on AMD hardware.
  • Instruction-Following Tasks: The SFT and DPO versions are suitable for applications requiring robust instruction adherence and conversational capabilities.
  • Small-Scale Deployments: Its 1.2B parameter size makes it efficient for scenarios where larger models are impractical, especially on AMD GPUs.