THGLab/Llama-3.1-8B-SmileyLlama-1.1

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jul 23, 2025License:llama3.1Architecture:Transformer Cold

THGLab/Llama-3.1-8B-SmileyLlama-1.1 is an 8 billion parameter language model fine-tuned from Llama-3.1-8B-Instruct, specifically designed for generating SMILES string representations of drug-like molecules. This model excels at creating molecular structures based on specified chemical properties, offering a specialized tool for cheminformatics and drug discovery applications. It leverages the Llama-3.1 architecture to efficiently produce SMILES strings on demand, making it suitable for high-throughput molecular design. The model's primary strength lies in its ability to generate molecules adhering to various chemical constraints, as detailed in its associated ArXiv preprint.

Loading preview...

THGLab/Llama-3.1-8B-SmileyLlama-1.1 Overview

THGLab/Llama-3.1-8B-SmileyLlama-1.1 is an 8 billion parameter language model, fine-tuned from Llama-3.1-8B-Instruct, with a specialized focus on generating SMILES (Simplified Molecular Input Line Entry System) strings for drug-like molecules. This model, dubbed SmileyLlama, was trained on millions of molecules to enable on-demand generation of chemical structures.

Key Capabilities

  • SMILES String Generation: Generates valid SMILES strings for molecules.
  • Property-Guided Generation: Can generate molecules based on a wide range of specified chemical properties, including:
    • H-bond donors and acceptors
    • Molecular weight
    • logP
    • Rotatable bonds
    • Fraction sp3
    • TPSA (Topological Polar Surface Area)
    • Presence of macrocycles
    • Absence/presence of "bad SMARTS" (structural alerts)
    • Absence/presence of specific covalent warheads (e.g., sulfonyl fluorides, acrylamides, epoxides)
    • Substructure matching (e.g., "A substructure of SMILES_STRING")
    • Chemical formula matching (e.g., "A chemical of CHEMICAL_FORMULA")
  • Efficient Generation: Supports num_return_sequences for rapid generation of multiple SMILES strings, limited by memory.

Use Cases

This model is particularly well-suited for applications in cheminformatics, computational chemistry, and drug discovery where the generation of novel molecules with specific desired properties is required. It can be used to:

  • Design new drug candidates by specifying desired physicochemical properties.
  • Generate libraries of molecules for virtual screening.
  • Explore chemical space based on structural and property constraints.

For more in-depth technical details, refer to the associated ArXiv preprint.