instruction-pretrain/medicine-Llama3-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jun 19, 2024License:llama3Architecture:Transformer0.0K Cold

instruction-pretrain/medicine-Llama3-8B is a biomedicine-specific language model developed by instruction-pretrain, built upon the Llama3-8B architecture. This model utilizes an "Instruction Pre-Training" framework, augmenting raw corpora with instruction-response pairs to enhance domain-adaptive continual pre-training. It demonstrates performance comparable to or exceeding Llama3-70B in specific contexts, making it suitable for specialized biomedical natural language processing tasks.

Loading preview...

Instruction Pre-Training: Biomedicine-Llama3-8B

This model, developed by instruction-pretrain, is a specialized version of Llama3-8B, fine-tuned for the biomedicine domain using a novel "Instruction Pre-Training" framework. This approach involves augmenting massive raw corpora with instruction-response pairs generated by an efficient instruction synthesizer built on open-source models.

Key Capabilities & Differentiators

  • Enhanced Domain Adaptation: Outperforms vanilla pre-training in domain-adaptive continual pre-training, specifically within the biomedicine sector.
  • Scalable Augmentation: Leverages an instruction synthesizer to create up to 500 million synthesized instruction-response pairs, scaling pre-trained tokens to 250 billion.
  • Performance: In continual pre-training, this Llama3-8B variant achieves performance comparable to or even surpassing Llama3-70B on domain-specific tasks.
  • Base Model: Built upon the pre-trained base Llama3-8B model, not the instruction-tuned version, requiring only BOS and EOS tokens for pre-training.

Use Cases

  • Biomedical NLP: Ideal for applications requiring deep understanding and generation within the biomedical field.
  • Research: Useful for researchers exploring advanced pre-training techniques and domain adaptation for large language models.
  • Evaluation: Provides a robust model for evaluating domain-specific tasks, with scripts available for Hugging Face models.