LOGOS-Hub/LOGOS-pretrain-1B
LOGOS-pretrain-1B is a 1 billion parameter autoregressive Transformer model developed by LOGOS-Hub, designed as a multi-domain generative framework for the natural sciences. It operates on a unified scientific grammar, encoding diverse scientific objects like proteins, small molecules, and materials into token sequences. This model excels at generation, prediction, and design tasks across various scientific domains without requiring explicit 3D geometric networks, making it suitable for tasks such as ligand design, retrosynthesis, and material generation.
Loading preview...
Overview
LOGOS (Language Of Generative Objects in Science) is a pioneering multi-domain generative framework built upon a unified scientific grammar. This approach allows a single autoregressive model to process and generate diverse scientific objects, including proteins, antibodies, small molecules, chemical reactions, and materials, by encoding them as token sequences over a shared vocabulary. Unlike methods relying on natural language intermediaries or explicit 3D geometric networks, LOGOS directly utilizes domain-native representations, discretizing and tokenizing key spatial relationships to learn complex structural interactions sequentially.
Key Capabilities
- Unified Scientific Grammar: Provides a common discrete token space for heterogeneous scientific objects and their relationships.
- Multi-Domain Functionality: A single model handles tasks across proteins, small molecules, materials, reactions, and antibodies.
- Spatial Interaction Learning: Captures spatial contact and constraint patterns through tokenized representations, bypassing the need for geometric neural networks.
- Consistent Pre-training: Ensures formal consistency between pre-training objectives and downstream task goals within the grammar space.
Supported Tasks
LOGOS demonstrates competitive performance across various downstream tasks, including:
- Interaction-Aware Ligand Design for Binding Pockets
- Protein Ligand-Binding Site Identification
- Retrosynthesis Prediction
- Unconditional Material Generation
- Protein Editing
- Antibody CDR Design