Uni-SMART/SciLitLLM

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Aug 26, 2024License:mitArchitecture:Transformer0.0K Open Weights Cold

SciLitLLM is a 7.6 billion parameter causal language model developed by Uni-SMART, based on the Qwen2 architecture, and specialized for scientific literature understanding. It leverages a hybrid strategy of continual pre-training and supervised fine-tuning to infuse scientific domain knowledge and enhance instruction-following. The model demonstrates improved performance on scientific literature understanding benchmarks like SciAssess and SciRIFF compared to other LLMs under 15B parameters. It is designed for tasks requiring deep comprehension of scientific texts, with a context length of 131072 tokens.

Loading preview...

SciLitLLM: Specialized for Scientific Literature Understanding

SciLitLLM-7B is a 7.6 billion parameter language model, adapted from the Qwen2-7B architecture, specifically designed for effective scientific literature comprehension. Developed by Uni-SMART, this model employs a hybrid training strategy combining continual pre-training (CPT) and supervised fine-tuning (SFT). This approach simultaneously integrates scientific domain knowledge and refines its ability to follow instructions for domain-specific tasks.

Key Capabilities & Training Insights

  • Domain Adaptation: Achieved through a meticulous pipeline for constructing high-quality CPT corpora and generating diverse SFT instructions, including PDF text extraction, error correction, quality filtering, and synthetic instruction creation.
  • Enhanced Performance: Demonstrates an average performance improvement of 3.6% on SciAssess and 10.1% on SciRIFF benchmarks when compared to leading LLMs with fewer than 15 billion parameters.
  • Context Length: Supports a substantial context length of 131072 tokens, crucial for processing lengthy scientific articles.

When to Use SciLitLLM

SciLitLLM is particularly well-suited for applications requiring deep understanding and processing of scientific texts. Developers should consider this model for use cases such as:

  • Summarizing scientific articles.
  • Extracting key information from research papers.
  • Answering questions based on scientific literature.
  • Analyzing and synthesizing information across multiple scientific documents.

For more detailed information on its development and methodology, refer to the accompanying paper.