JarvisEvo/JarvisEvo

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 13, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

JarvisEvo is an 8 billion parameter model developed by Yunlong Lin et al. that functions as a self-evolving photo editing agent. It utilizes interleaved multimodal Chain-of-Thought (iMCoT) reasoning for image editing, integrating multi-step planning, dynamic tool orchestration, and iterative visual feedback. The model incorporates self-evaluation and refinement, combining professional tools like Adobe Lightroom and Qwen-Image-Edit for expert-level refinement and creative synthesis. Its primary strength lies in its closed-loop workflow for generating visually compelling and creatively aligned image edits.

Loading preview...

JarvisEvo: Self-Evolving Photo Editing Agent

JarvisEvo is an innovative 8 billion parameter model designed as a self-evolving agent for advanced photo editing. Developed by Yunlong Lin et al., it introduces an interleaved multimodal Chain-of-Thought (iMCoT) reasoning framework. This framework enables the model to perform complex image editing tasks through multi-step planning, dynamic tool orchestration, and continuous visual feedback.

Key Capabilities

  • iMCoT Reasoning: Employs a sophisticated reasoning process that combines planning with real-time visual analysis.
  • Self-Evaluation and Refinement: Integrates a closed-loop workflow for self-correction, ensuring outputs are both visually appealing and consistent with the creative intent.
  • Tool Orchestration: Seamlessly combines specialized tools like Adobe Lightroom for precise adjustments and Qwen-Image-Edit for generative tasks, achieving a synergy of expert-level control and creative generation.
  • Iterative Visual Feedback: Continuously processes visual information to guide and refine the editing process.

Good For

  • Automated, high-quality photo editing requiring complex, multi-step operations.
  • Applications needing a blend of precise, expert-level adjustments and creative generative image modifications.
  • Research and development in multimodal AI agents and self-improving systems for visual tasks.