saketlab/seqoutlm-0.5B

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 12, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

SeqoutLM 0.5B is a 500 million parameter specialized biomedical metadata normalization model developed by saketlab. Fine-tuned from Llama 3.2 1B Instruct using QLoRA, it converts unstructured genomic sample metadata into a standardized 16-field JSON representation. This model is designed for large-scale metadata harmonization across public genomics repositories like GEO and SRA, facilitating downstream search, filtering, and analytics workflows.

Loading preview...

SeqoutLM 0.5B: Biomedical Metadata Normalization

SeqoutLM 0.5B is a specialized language model designed for biomedical metadata normalization. It takes unstructured genomic sample metadata and transforms it into a fixed 16-field JSON schema. This model is crucial for harmonizing diverse metadata from public repositories such as GEO and SRA, enabling more efficient data integration and analysis.

Key Capabilities

  • Standardized Output: Always produces a JSON object with 16 predefined fields (e.g., organism, tissue, disease, assay).
  • Missing Value Handling: Outputs null for fields that cannot be determined from the input text.
  • Biomedical Focus: Specifically trained on the saketlab/seqout-normalized-conversation dataset, comprising over 600K samples of free-text biomedical metadata paired with normalized JSON targets.
  • Efficient Fine-tuning: Built upon Llama 3.2 1B Instruct and fine-tuned using the Unsloth training stack with QLoRA, optimizing for performance and resource efficiency.

Good For

  • Large-scale Metadata Harmonization: Ideal for standardizing vast amounts of genomic sample metadata.
  • Enabling Downstream Analytics: Facilitates improved search, filtering, and integration of biomedical datasets.
  • Automated Data Curation: Automates the process of converting varied text descriptions into a structured, queryable format.