vandijklab/C2S-Scale-Gemma-2-27B

TEXT GENERATIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kPublished:Oct 6, 2025License:cc-by-4.0Architecture:Transformer0.2K Open Weights Cold

C2S-Scale-Gemma-2-27B is a 27 billion parameter language model developed by van Dijk Lab (Yale), Google Research, and Google DeepMind, built upon the Gemma-2 architecture. Fine-tuned for single-cell biology, it processes scRNA-seq data as 'cell sentences' to understand gene expression. This model excels at tasks like cell type prediction, tissue classification, and generating gene expression profiles, trained on over 57 million cells with a 32768 token context length.

Loading preview...

C2S-Scale-Gemma-2-27B: Single-Cell Biology LLM

C2S-Scale-Gemma-2-27B is a 27 billion parameter language model, a collaboration between Yale's van Dijk Lab, Google Research, and Google DeepMind. It leverages the Gemma-2 architecture and the Cell2Sentence (C2S) framework to interpret single-cell RNA sequencing (scRNA-seq) data as 'cell sentences'—ordered sequences of gene names. Trained on over 57 million human and mouse cells from CellxGene and the Human Cell Atlas, this model significantly scales LLM capabilities for biological analysis.

Key Capabilities

  • Single-Cell Data Understanding: Processes high-dimensional scRNA-seq data by converting it into a language-like format.
  • Versatile Performance: Demonstrates strong results across diverse single-cell and multi-cell tasks, including advanced downstream applications like cluster captioning and perturbation prediction.
  • Generative Power: Capable of generating realistic single-cell gene expression profiles for in silico experiments.
  • Foundation Model: Serves as a powerful pretrained base for fine-tuning on specialized, domain-specific single-cell analysis tasks.
  • Scalability: Trained on a massive dataset using Google's TPU v5s, enabling a significant increase in model size and capability.

Good for

  • Cell Type Prediction & Annotation: Streamlining the annotation of large-scale single-cell datasets.
  • Biomarker Discovery: Identifying gene patterns for specific cell states or diseases.
  • In Silico Experiments: Generating cells under specific conditions to test biological hypotheses.
  • Research in Single-Cell Genomics: A foundational tool for computational biology and interpreting scRNA-seq experiments.