Name: saketlab/seqoutlm-0.5B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: saketlab

SeqoutLM 0.5B: Biomedical Metadata Normalization

SeqoutLM 0.5B is a specialized language model designed for biomedical metadata normalization. It takes unstructured genomic sample metadata and transforms it into a fixed 16-field JSON schema. This model is crucial for harmonizing diverse metadata from public repositories such as GEO and SRA, enabling more efficient data integration and analysis.

Key Capabilities

Standardized Output: Always produces a JSON object with 16 predefined fields (e.g., organism, tissue, disease, assay).
Missing Value Handling: Outputs null for fields that cannot be determined from the input text.
Biomedical Focus: Specifically trained on the saketlab/seqout-normalized-conversation dataset, comprising over 600K samples of free-text biomedical metadata paired with normalized JSON targets.
Efficient Fine-tuning: Built upon Llama 3.2 1B Instruct and fine-tuned using the Unsloth training stack with QLoRA, optimizing for performance and resource efficiency.

Good For

Large-scale Metadata Harmonization: Ideal for standardizing vast amounts of genomic sample metadata.
Enabling Downstream Analytics: Facilitates improved search, filtering, and integration of biomedical datasets.
Automated Data Curation: Automates the process of converting varied text descriptions into a structured, queryable format.

Overview

SeqoutLM 0.5B: Biomedical Metadata Normalization

Key Capabilities

Good For

Full Model Card (README)