Overview

The AMD-OLMo-1B-SFT-DPO is a 1.2 billion parameter language model developed by AMD, built upon the OLMo architecture. It represents the DPO-aligned version in the AMD-OLMo series, following a pre-trained base model and a supervised fine-tuned (SFT) variant. The model was trained from scratch on AMD Instinct™ MI250 GPUs, utilizing a subset of the Dolma v1.7 dataset for pre-training. Its instruction-tuned capabilities were developed through a two-phase SFT process using datasets like Tulu V2, OpenHermes-2.5, WebInstructSub, and Code-Feedback, followed by Direct Preference Optimization (DPO) on the UltraFeedback dataset to align with human preferences.

Key Capabilities

Instruction Following: Excels in understanding and executing instructions due to extensive supervised fine-tuning.
Human Preference Alignment: Optimized using DPO to generate responses that are more aligned with human preferences, as evidenced by strong performance in chat benchmarks like AlpacaEval.
Competitive Performance: Demonstrates competitive results across various standard benchmarks for 1B-class models, particularly in instruction tuning and chat scenarios.
Hardware Optimized: Developed and trained on AMD Instinct™ MI250 GPUs, showcasing AMD's capabilities in large language model development.

Good For

Instruction-tuned applications: Ideal for tasks requiring precise instruction following.
Chatbots and conversational AI: Suitable for building interactive agents that produce human-like and preferred responses.
Research and development on AMD hardware: Provides a strong baseline for further experimentation and optimization on AMD GPUs.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)