Name: northenlab/soilfm-qwen2.5-14b-literature-cpt API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: northenlab

SoilFM Language Tower: Domain-Adapted for Soil Science

This model, northenlab/soilfm-qwen2.5-14b-literature-cpt, is a specialized large language model (LLM) built upon the 14.2 billion parameter Qwen2.5-14B-Instruct architecture. Developed by the Northen Lab at Lawrence Berkeley National Laboratory, it is the "Language Tower" component of the multi-modal SoilFM2 foundation model for soil microbiome analysis. Its primary differentiator is its domain adaptation to soil science and soil microbiology through continued pretraining.

Key Capabilities & Features

Domain-Specific Knowledge: Fine-tuned on 200,000 curated text passages from sources like PubMed Central soil microbiology papers, Wikipedia soil science articles, and the USDA Soil Survey Manual.
High Context Length: Inherits a 32,768-token context window from its base model, suitable for processing extensive scientific texts.
Efficient Training: Utilized QLoRA (4-bit NF4) for continued pretraining, making the process efficient while achieving a 7.2% improvement in validation loss over 1,500 steps.
Integration with SoilFM2: Designed to provide domain-grounded context within the broader SoilFM2 multi-modal pipeline, supporting applications like prebiotic recommendation.

Intended Uses

Generating detailed explanations of complex soil microbial processes, rhizosphere ecology, and plant-microbe interactions.
Serving as a specialized backbone for downstream fine-tuning or Retrieval-Augmented Generation (RAG) systems in soil science.
Supporting research and educational applications requiring deep knowledge in soil microbiology.

This model is intended for research and non-commercial use only, inheriting licensing considerations from its training data.

Overview

SoilFM Language Tower: Domain-Adapted for Soil Science

Key Capabilities & Features

Intended Uses

Full Model Card (README)