GreatCaptainNemo/ProLLaMA_Stage_1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 25, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

ProLLaMA_Stage_1 by GreatCaptainNemo is a 7 billion parameter protein large language model based on Llama-2-7b, designed for multi-task protein language processing. This model specializes in generating and understanding protein sequences, accepting partial sequences as input. It is particularly suited for bioinformatics applications requiring protein sequence prediction and analysis.

Loading preview...

ProLLaMA_Stage_1: A Protein-Specific Language Model

ProLLaMA_Stage_1 is a 7 billion parameter protein large language model (LLM) developed by GreatCaptainNemo, built upon the Llama-2-7b architecture. This model is specifically engineered for multi-task protein language processing, focusing on the generation and understanding of protein sequences.

Key Capabilities

  • Protein Sequence Generation: Capable of generating full protein sequences from partial inputs.
  • Llama-2 Base: Leverages the robust architecture of Llama-2-7b, adapting it for biological sequence data.
  • Interactive and Batch Processing: Supports both interactive input for single sequence generation and batch processing from input files for larger tasks.

Input Format

The model expects protein sequences, which can be full or partial, using a specific Seq=< format. For example, Seq=<MAPGGMPRE can be used to specify the beginning of a protein sequence.

Use Cases

  • Bioinformatics Research: Ideal for researchers working on protein design, engineering, and sequence analysis.
  • Drug Discovery: Can assist in generating novel protein sequences for therapeutic development.
  • Educational Tools: Useful for demonstrating protein language modeling concepts and applications.