guangshuo/CellReasoner-7B
CellReasoner-7B by guangshuo is a 7.6 billion parameter large language model, built on Qwen2.5-7B-Instruct, specifically enhanced for biological reasoning. It excels at zero-shot and few-shot cell type annotation for single-cell RNA-seq (scRNA-seq) and scATAC-seq data, demonstrating superior performance in interpretability and generalization. This model is optimized for marker-by-marker annotation, ontology mapping, and biological reasoning, requiring only a few expert-level reasoning samples for activation.
Loading preview...
CellReasoner-7B: Reasoning-Enhanced Cell Type Annotation
CellReasoner-7B is a specialized 7.6 billion parameter large language model developed by guangshuo, fine-tuned from Qwen2.5-7B-Instruct, designed for advanced cell type annotation. Its core innovation lies in its ability to activate expert-level biological reasoning with minimal supervision, requiring only a few expert-level reasoning samples.
Key Capabilities
- Expert-Level Interpretability: Provides clear, reasoning-based explanations for cell type assignments.
- Zero-/Few-Shot Generalization: Achieves high accuracy on unseen datasets with limited or no prior examples.
- Superior Performance: Outperforms general-purpose LLMs like Deepseek and ChatGPT, as well as traditional methods like singleR, on scRNA-seq (e.g., PBMC3K, PDAC datasets) and scATAC-seq data.
- Versatile Annotation: Supports marker-by-marker annotation, ontology mapping, and complex biological reasoning tasks.
- Scalable & Efficient: Delivers accurate and interpretable cell annotation with minimal data requirements.
Good For
- Biomedical Researchers: Annotating cell types in single-cell sequencing data (scRNA-seq, scATAC-seq).
- Computational Biologists: Developing and evaluating reasoning-enhanced models for biological applications.
- Drug Discovery: Identifying specific cell populations relevant to disease mechanisms or therapeutic targets.
CellReasoner-7B is part of a model zoo that also includes CellReasoner-32B, built on QwQ-32B, offering a range of capabilities for diverse research needs. The model leverages the LLaMA-Factory framework for efficient fine-tuning.