amd/AMD-OLMo-1B-SFT

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Oct 31, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The AMD-OLMo-1B-SFT is a 1.2 billion parameter instruction-tuned causal language model developed by AMD, based on the OLMo architecture. It was fine-tuned on a two-phase dataset including Tulu V2, OpenHermes-2.5, WebInstructSub, and Code-Feedback, making it suitable for general instruction-following tasks. This model is part of a series trained from scratch on AMD Instinct™ MI250 GPUs, demonstrating competitive performance against other 1B-class models in instruction-following benchmarks.

Loading preview...

AMD-OLMo-1B-SFT: Instruction-Tuned Language Model by AMD

AMD-OLMo-1B-SFT is a 1.2 billion parameter language model developed by AMD, building upon the fully open-source OLMo-1B architecture. This specific version is supervised fine-tuned (SFT) through a two-phase process, first on the Tulu V2 dataset, followed by a mixture of OpenHermes-2.5, WebInstructSub, and Code-Feedback datasets.

Key Capabilities

  • Instruction Following: Excels in general instruction-following tasks due to its comprehensive SFT process.
  • Competitive Performance: Achieves strong results in instruction tuning benchmarks, including an average score of 51.60 across standard benchmarks and 4.35 on MTBench, outperforming several comparable 1B-class models.
  • Hardware Optimized: Trained from scratch on AMD Instinct™ MI250 GPUs, showcasing AMD's capabilities in large language model development.

Good For

  • General-purpose instruction following: Ideal for applications requiring a model to understand and execute diverse instructions.
  • Research and Development: Provides a robust base for further experimentation and fine-tuning, especially for those working with AMD hardware.
  • Benchmarking: Useful for comparing performance against other 1B parameter models in instruction-tuned scenarios.