ZoengHouNaam/Embed-RL-4B

VISIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 15, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

ZoengHouNaam/Embed-RL-4B is a 4 billion parameter multimodal embedding model developed by Tsinghua University and Kuaishou Technology. It utilizes an Embedder-Guided Reinforcement Learning (EG-RL) framework to optimize a Reasoner for generating evidential Traceability Chain-of-Thought (T-CoT). This approach enhances universal multimodal embeddings by integrating reasoning relevant to target retrieval, outperforming pioneering models on MMEB-V2 and UVRB benchmarks with limited computational resources.

Loading preview...

Embed-RL-4B: Reasoning-Driven Multimodal Embeddings

Embed-RL-4B is a 4 billion parameter multimodal embedding model developed by Tsinghua University and Kuaishou Technology. It addresses limitations in existing generative embedding methods by proposing a reasoning-driven Universal Multimodal Embedding (UME) framework. This framework integrates Embedder-Guided Reinforcement Learning (EG-RL) to optimize a Reasoner, enabling it to produce evidential Traceability CoT (T-CoT).

Key Capabilities & Innovations

  • Embedder-Guided Reinforcement Learning (EG-RL): A novel framework where the Embedder provides explicit supervision to the Reasoner, ensuring generated Chain-of-Thought (CoT) traces are aligned with embedding tasks.
  • Traceability CoT (T-CoT): Extracts critical multimodal cues to focus on retrieval-relevant elements, providing multimodal inputs for the Embedder.
  • Enhanced Cross-Modal Semantic Consistency: Integrates multimodal evidence in structured reasoning, paired with retrieval-oriented alignment.
  • Strong Performance with Limited Resources: Outperforms pioneering embedding models on both MMEB-V2 and UVRB benchmarks despite using limited computational resources.

Good For

  • Developing universal multimodal embeddings for diverse cross-modal tasks.
  • Applications requiring fine-grained matching capabilities across modalities.
  • Scenarios where targeted reasoning optimization can significantly improve multimodal embedding quality and generalization.