BennyDaBall/Z-Image-Engineer-V6
BennyDaBall/Z-Image-Engineer-V6 is a 4 billion parameter Qwen text encoder, fine-tuned from Tongyi-MAI/Z-Image-Turbo, optimized for enhancing image generation prompts and serving as a direct text encoder for Z-Image workflows. It excels at transforming minimal prompts into rich, structured visual narratives by adding explicit scene composition, lighting, and texture details. This model is designed for dual-role performance, functioning as a local prompt-enhancement tool for LM Studio and a merged HF text encoder for Z-Image, built using the SMART DoRA training system.
Loading preview...
Z-Image-Engineer V6 Overview
Z-Image-Engineer V6 is a 4 billion parameter Qwen text encoder, fine-tuned from Tongyi-MAI/Z-Image-Turbo, specifically designed to enhance and encode prompts for image generation. It transforms simple, minimal prompts into highly structured and descriptive visual narratives, adding details like scene composition, lighting, material textures, and depth separation. The model also actively removes common "empty prompt sludge" such as "8k, masterpiece, trending on ArtStation."
Key Capabilities
- Prompt Enhancement: Upgrades basic concepts into detailed, high-fidelity visual prompts for local use.
- Text Encoder Swap: Can directly replace the stock Z-Image Qwen text encoder to generate varied conditioning from the same seed.
- Hybrid Mode: Allows for V6 to rewrite a prompt and then encode it, effectively managing both the narrative and the image model's input.
- Private Local Workflow: Optimized for use with LM Studio, ComfyUI, and
llama.cppfor privacy-focused local operations.
Under the Hood: SMART DoRA Training
V6 utilizes the SMART DoRA (Weight-Decomposed Low-Rank Adaptation) training system, which provides precise adapter updates by decoupling directional and magnitude adjustments. SMART incorporates auxiliary pressure through various regularizers to prevent repetitive prompt loops and superficial patterns:
- Entropic Regularizer: Increases output probability diversity, reducing generic vocabulary.
- Holographic Regularizer: Enforces structured, depth-wise feature logic for improved foreground/background hierarchy.
- Topological Regularizer: Stabilizes coherent latent trajectories, ensuring natural prompt flow.
- Manifold Regularizer: Regulates overall weight distributions for stable behavior under refinement.
The model's final architecture is a blended composite from a multi-stage refinement pipeline, including a base pass, retention pass, supervised refinement (SceneClean SFT32), and binary anti-repeat refinement (AntiRepeat Binary24), culminating in a balanced blend for vivid descriptions and tight syntax.
Good For
- Developers and artists seeking to generate more detailed and controlled images from simpler text prompts.
- Users of LM Studio, ComfyUI, and
llama.cppwho require a powerful, locally runnable prompt enhancement and text encoding solution. - Anyone looking to replace or augment their existing Z-Image text encoder for different image generation outcomes.