H-D-T/Buzz-8b-Large-v0.5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kTool Calling:SupportedPublished:May 6, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Buzz-8b-Large-v0.5 is an 8 billion parameter language model developed by Alignment Lab AI in collaboration with Hive Digital Technologies. This model is part of the Buzz series, focusing on advancing efficiency through iterative fine-tuning and optimization of existing pretrained language models. It aims to demonstrate the potential for continuous performance refinement with optimal use of computational resources. Buzz-8b-Large-v0.5 is designed as a completions model, generating text to complete given prompts.

Loading preview...

Overview

Buzz-8b-Large-v0.5 is a language model developed by Alignment Lab AI and Hive Digital Technologies, forming part of the Buzz series which includes Buzz-2.5b-Small and Buzz-5b-Medium. The project emphasizes the reuse and optimization of existing pretrained language models through an iterative fine-tuning methodology. This approach combines high-quality data with carefully selected "grounding" distributions from previous training epochs to achieve cost-effective performance improvements.

Key Capabilities

  • Iterative Fine-Tuning: Leverages research from papers like "Simple and Scalable Strategies to Continually Pre-train Large Language Models" and "NEFTune" to continuously refine model performance.
  • Completions Model: Primarily functions as a text completions model, generating continuations for input prompts.
  • Efficiency Focus: Aims to demonstrate efficient and effective locally runnable language models by optimizing FlOps usage.
  • Toolkit for Community: The Buzz model, dataset, and codebase are intended to be released as a toolkit for the community to refine, filter, augment data, and train custom variants.

Usage Notes

  • The model is a completions model; for conversational use, users should append <|end_of_text|> <|begin_of_text|>assistant: to prompts, with the speaker role being flexible.
  • Future iterations are expected to adopt formatting similar to OpenChat.

Research Foundation

The development is underpinned by research into continuous pre-training, noisy embeddings for instruction fine-tuning, and optimization techniques, with ongoing efforts to improve context handling, including collaboration with the developer of the Axolotl training framework.