northenlab/soilfm-qwen2.5-14b-literature-cpt

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Dec 31, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The northenlab/soilfm-qwen2.5-14b-literature-cpt is a 14.2 billion parameter Qwen2.5-based causal language model, domain-adapted for soil science and soil microbiology. Developed by Northen Lab at Lawrence Berkeley National Laboratory, it was created by continued pretraining on 200,000 curated soil science text passages. This model excels at generating explanations of soil microbial processes and serves as a soil-science-aware backbone for research and educational applications.

Loading preview...

SoilFM Language Tower: Domain-Adapted for Soil Science

This model, northenlab/soilfm-qwen2.5-14b-literature-cpt, is a specialized large language model (LLM) built upon the 14.2 billion parameter Qwen2.5-14B-Instruct architecture. Developed by the Northen Lab at Lawrence Berkeley National Laboratory, it is the "Language Tower" component of the multi-modal SoilFM2 foundation model for soil microbiome analysis. Its primary differentiator is its domain adaptation to soil science and soil microbiology through continued pretraining.

Key Capabilities & Features

  • Domain-Specific Knowledge: Fine-tuned on 200,000 curated text passages from sources like PubMed Central soil microbiology papers, Wikipedia soil science articles, and the USDA Soil Survey Manual.
  • High Context Length: Inherits a 32,768-token context window from its base model, suitable for processing extensive scientific texts.
  • Efficient Training: Utilized QLoRA (4-bit NF4) for continued pretraining, making the process efficient while achieving a 7.2% improvement in validation loss over 1,500 steps.
  • Integration with SoilFM2: Designed to provide domain-grounded context within the broader SoilFM2 multi-modal pipeline, supporting applications like prebiotic recommendation.

Intended Uses

  • Generating detailed explanations of complex soil microbial processes, rhizosphere ecology, and plant-microbe interactions.
  • Serving as a specialized backbone for downstream fine-tuning or Retrieval-Augmented Generation (RAG) systems in soil science.
  • Supporting research and educational applications requiring deep knowledge in soil microbiology.

This model is intended for research and non-commercial use only, inheriting licensing considerations from its training data.