THGLab/Llama-3.1-8B-SmileyLlama-1.1
THGLab/Llama-3.1-8B-SmileyLlama-1.1 is an 8 billion parameter language model fine-tuned from Llama-3.1-8B-Instruct, specifically designed for generating SMILES string representations of drug-like molecules. This model excels at creating molecular structures based on specified chemical properties, offering a specialized tool for cheminformatics and drug discovery applications. It leverages the Llama-3.1 architecture to efficiently produce SMILES strings on demand, making it suitable for high-throughput molecular design. The model's primary strength lies in its ability to generate molecules adhering to various chemical constraints, as detailed in its associated ArXiv preprint.
Loading preview...
THGLab/Llama-3.1-8B-SmileyLlama-1.1 Overview
THGLab/Llama-3.1-8B-SmileyLlama-1.1 is an 8 billion parameter language model, fine-tuned from Llama-3.1-8B-Instruct, with a specialized focus on generating SMILES (Simplified Molecular Input Line Entry System) strings for drug-like molecules. This model, dubbed SmileyLlama, was trained on millions of molecules to enable on-demand generation of chemical structures.
Key Capabilities
- SMILES String Generation: Generates valid SMILES strings for molecules.
- Property-Guided Generation: Can generate molecules based on a wide range of specified chemical properties, including:
- H-bond donors and acceptors
- Molecular weight
- logP
- Rotatable bonds
- Fraction sp3
- TPSA (Topological Polar Surface Area)
- Presence of macrocycles
- Absence/presence of "bad SMARTS" (structural alerts)
- Absence/presence of specific covalent warheads (e.g., sulfonyl fluorides, acrylamides, epoxides)
- Substructure matching (e.g., "A substructure of SMILES_STRING")
- Chemical formula matching (e.g., "A chemical of CHEMICAL_FORMULA")
- Efficient Generation: Supports
num_return_sequencesfor rapid generation of multiple SMILES strings, limited by memory.
Use Cases
This model is particularly well-suited for applications in cheminformatics, computational chemistry, and drug discovery where the generation of novel molecules with specific desired properties is required. It can be used to:
- Design new drug candidates by specifying desired physicochemical properties.
- Generate libraries of molecules for virtual screening.
- Explore chemical space based on structural and property constraints.
For more in-depth technical details, refer to the associated ArXiv preprint.