amd/AMD-OLMo-1B
AMD-OLMo-1B is a 1.2 billion parameter language model developed by AMD, based on the OLMo architecture and trained from scratch on AMD Instinct MI250 GPUs. Pre-trained on a 1.3 trillion token subset of Dolma v1.7, this model offers competitive performance among 1B-class models, particularly in commonsense reasoning and instruction-following after fine-tuning. It is designed for research purposes, providing a foundation for further development and evaluation on AMD hardware.
Loading preview...
AMD-OLMo-1B Overview
AMD-OLMo-1B is a 1.2 billion parameter language model developed by AMD, leveraging the OLMo architecture and trained entirely on AMD Instinct MI250 GPUs. This model is part of a series that includes supervised fine-tuned (SFT) and Direct Preference Optimization (DPO) aligned versions, demonstrating AMD's commitment to open-source AI development on their hardware.
Key Capabilities
- Foundation Model: The base AMD-OLMo-1B is pre-trained on a massive 1.3 trillion token subset of the Dolma v1.7 dataset.
- Instruction Following: The SFT variant (AMD-OLMo-1B-SFT) is fine-tuned on diverse datasets like Tulu V2, OpenHermes-2.5, WebInstructSub, and Code-Feedback, enhancing its ability to follow instructions.
- Human Alignment: The DPO-aligned version (AMD-OLMo-1B-SFT-DPO) is optimized for human preferences using the UltraFeedback dataset, improving conversational quality.
- Competitive Performance: Benchmarks show AMD-OLMo-1B models achieving strong results in their class, particularly in areas like
arc_easy,sciq, andmmlufor instruction-tuned variants. - Hardware Optimized: Specifically trained and optimized for AMD Instinct™ MI250 GPUs, showcasing efficient training throughput of 12,200 tokens/sec/GPU.
Good For
- Research and Development: Ideal for researchers and developers exploring language models on AMD hardware.
- Instruction-Following Tasks: The SFT and DPO versions are suitable for applications requiring robust instruction adherence and conversational capabilities.
- Small-Scale Deployments: Its 1.2B parameter size makes it efficient for scenarios where larger models are impractical, especially on AMD GPUs.