Huggggooo/ProtoCycle-7B-SFT
ProtoCycle-7B-SFT by Huggggooo is a 7.6 billion parameter instruction-tuned model, initialized from Qwen2.5-7B-Instruct, specifically designed for agentic protein design. It is trained to invoke biology tools like scaffold retrieval and protein scoring using a / / / protocol. This model excels at multi-turn agentic tool-use trajectories for complex protein engineering tasks, operating with a 32K token context length.
Loading preview...
ProtoCycle-7B-SFT: Agentic Protein Design Model
ProtoCycle-7B-SFT is a 7.6 billion parameter instruction-tuned model developed by Huggggooo, serving as the supervised fine-tuning (SFT) checkpoint for the broader ProtoCycle agentic protein design project. It is built upon the Qwen/Qwen2.5-7B-Instruct base model and is specifically trained to interact with biology tools for protein engineering tasks.
Key Capabilities & Features
- Agentic Tool-Use: Designed to invoke specialized biology tools such as scaffold retrieval, constraint building, ESM inpainting, and ProTrek scoring.
- Structured Interaction: Utilizes a defined agent protocol:
<think> / <plan> / <tool_call> / <answer>for sequential reasoning and tool execution. - Protein Design Focus: Fine-tuned on 2,000 agentic multi-turn trajectories specifically for protein design, available in the Huggggooo/ProtoCycle-Data dataset.
- High Context Length: Supports a sequence length of 32,768 tokens, enabling complex multi-turn interactions and detailed problem-solving.
- RL Stage Foundation: This SFT checkpoint is the initial stage for the subsequent reinforcement learning (RL) phase of the ProtoCycle project.
When to Use This Model
ProtoCycle-7B-SFT is ideal for researchers and developers working on:
- Automated Protein Engineering: Tasks requiring an AI agent to plan and execute steps involving various biological tools.
- Tool-Augmented LLMs: Developing systems where language models need to interact with external APIs or specialized functions in a structured manner.
- Agentic Workflow Development: Experimenting with and building agents that follow a think-plan-tool-answer paradigm for scientific problem-solving.
This model provides a robust foundation for agentic applications in the field of protein design, leveraging its specialized training and tool-use protocol.