OmniChem-7B-v1 by Billy-Liu-DUT is a 7.6 billion parameter instruction-tuned causal language model built upon Qwen2.5-7B-Instruct, specialized for chemistry. It features systematic hallucination mitigation by internalizing physical constraints and structured reasoning, demonstrating high performance in tasks like photophysical property modulation, physicochemical property optimization, and synthesis planning. The model was developed through continued pre-training on a 5-billion-token specialized corpus and fine-tuned with 563,000 chemistry-specific QA pairs and Chain-of-Thought entries. It supports a context length of up to 128K tokens using YaRN for long text processing.
Loading preview...
OmniChem-7B-v1: A Specialized Chemistry LLM
OmniChem-7B-v1 is a 7.6 billion parameter instruction-tuned causal language model developed by Billy-Liu-DUT, specifically designed for the domain of chemistry. Built upon the Qwen2.5-7B-Instruct architecture, this model addresses the critical challenge of hallucination in scientific applications by integrating physical constraints and structured reasoning patterns.
Key Capabilities and Innovations
- Systematic Hallucination Mitigation: Reduces the generation of factually incorrect information by internalizing domain-specific constraints.
- Expert-Level Chemistry Performance: Excels in core chemistry research tasks, including photophysical property modulation, physicochemical property optimization, and synthesis planning.
- Robust Foundation: Underwent continued pre-training on a 5-billion-token specialized chemistry corpus and fine-tuned with 199,589 QA pairs and 363,045 Chain-of-Thought (CoT) entries, available as the OmniChem-563K dataset.
- Extended Context Length: Supports up to 128K tokens for processing long texts through the integration of YaRN techniques.
When to Use OmniChem-7B-v1
This model is ideal for academic and non-commercial applications requiring accurate and detailed chemical reasoning. It is particularly well-suited for:
- Generating synthetic routes for small molecules.
- Optimizing chemical properties.
- Modulating photophysical properties.
- Any chemistry-related task where factual accuracy and structured reasoning are paramount.