amd/AMD-OLMo-1B-SFT-DPO

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Oct 31, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The AMD-OLMo-1B-SFT-DPO is a 1.2 billion parameter language model developed by AMD, based on the OLMo architecture. This model has undergone supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) for alignment with human preferences, making it suitable for instruction-following and chat applications. It was trained from scratch on AMD Instinct™ MI250 GPUs, demonstrating competitive performance against other 1B-class models in instruction tuning and chat benchmarks.

Loading preview...

Overview

The AMD-OLMo-1B-SFT-DPO is a 1.2 billion parameter language model developed by AMD, built upon the OLMo architecture. It represents the DPO-aligned version in the AMD-OLMo series, following a pre-trained base model and a supervised fine-tuned (SFT) variant. The model was trained from scratch on AMD Instinct™ MI250 GPUs, utilizing a subset of the Dolma v1.7 dataset for pre-training. Its instruction-tuned capabilities were developed through a two-phase SFT process using datasets like Tulu V2, OpenHermes-2.5, WebInstructSub, and Code-Feedback, followed by Direct Preference Optimization (DPO) on the UltraFeedback dataset to align with human preferences.

Key Capabilities

  • Instruction Following: Excels in understanding and executing instructions due to extensive supervised fine-tuning.
  • Human Preference Alignment: Optimized using DPO to generate responses that are more aligned with human preferences, as evidenced by strong performance in chat benchmarks like AlpacaEval.
  • Competitive Performance: Demonstrates competitive results across various standard benchmarks for 1B-class models, particularly in instruction tuning and chat scenarios.
  • Hardware Optimized: Developed and trained on AMD Instinct™ MI250 GPUs, showcasing AMD's capabilities in large language model development.

Good For

  • Instruction-tuned applications: Ideal for tasks requiring precise instruction following.
  • Chatbots and conversational AI: Suitable for building interactive agents that produce human-like and preferred responses.
  • Research and development on AMD hardware: Provides a strong baseline for further experimentation and optimization on AMD GPUs.