Huggggooo/ProtoCycle-7B-SFT

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 18, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

ProtoCycle-7B-SFT by Huggggooo is a 7.6 billion parameter instruction-tuned model, initialized from Qwen2.5-7B-Instruct, specifically designed for agentic protein design. It is trained to invoke biology tools like scaffold retrieval and protein scoring using a / / / protocol. This model excels at multi-turn agentic tool-use trajectories for complex protein engineering tasks, operating with a 32K token context length.

Loading preview...

ProtoCycle-7B-SFT: Agentic Protein Design Model

ProtoCycle-7B-SFT is a 7.6 billion parameter instruction-tuned model developed by Huggggooo, serving as the supervised fine-tuning (SFT) checkpoint for the broader ProtoCycle agentic protein design project. It is built upon the Qwen/Qwen2.5-7B-Instruct base model and is specifically trained to interact with biology tools for protein engineering tasks.

Key Capabilities & Features

  • Agentic Tool-Use: Designed to invoke specialized biology tools such as scaffold retrieval, constraint building, ESM inpainting, and ProTrek scoring.
  • Structured Interaction: Utilizes a defined agent protocol: <think> / <plan> / <tool_call> / <answer> for sequential reasoning and tool execution.
  • Protein Design Focus: Fine-tuned on 2,000 agentic multi-turn trajectories specifically for protein design, available in the Huggggooo/ProtoCycle-Data dataset.
  • High Context Length: Supports a sequence length of 32,768 tokens, enabling complex multi-turn interactions and detailed problem-solving.
  • RL Stage Foundation: This SFT checkpoint is the initial stage for the subsequent reinforcement learning (RL) phase of the ProtoCycle project.

When to Use This Model

ProtoCycle-7B-SFT is ideal for researchers and developers working on:

  • Automated Protein Engineering: Tasks requiring an AI agent to plan and execute steps involving various biological tools.
  • Tool-Augmented LLMs: Developing systems where language models need to interact with external APIs or specialized functions in a structured manner.
  • Agentic Workflow Development: Experimenting with and building agents that follow a think-plan-tool-answer paradigm for scientific problem-solving.

This model provides a robust foundation for agentic applications in the field of protein design, leveraging its specialized training and tool-use protocol.