zstanjj/MemSifter-4B-Thinking

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:mitArchitecture:Transformer0.0K Open Weights Warm

MemSifter-4B-Thinking by zstanjj is a 4 billion parameter generative session ranker, fine-tuned from Qwen3-4B with DAPO reinforcement learning. It specializes in fine-grained reranking of conversational sessions, optimizing for marginal utility and rank-sensitive relevance. This model serves as a core component for offloading LLM memory retrieval, enhancing the context provided to downstream chat LLMs.

Loading preview...

MemSifter-4B-Thinking Overview

MemSifter-4B-Thinking is a 4 billion parameter generative session ranker developed by zstanjj, designed to enhance LLM memory retrieval. It functions as the core retrieval component within the MemSifter framework, which offloads memory retrieval through outcome-driven proxy reasoning. The model's primary role is to perform fine-grained reranking of candidate conversation sessions, which have been pre-filtered by a dense embedding model like bge-m3.

Key Capabilities

  • Generative Reranking: Excels at re-ordering conversational sessions to surface the most relevant ones for a given user query.
  • Reinforcement Learning: Fine-tuned using the DAPO (Deep Actor-Proxy Optimization) algorithm, incorporating a task reward that combines marginal utility and rank-sensitive rewards.
  • Context Optimization: Designed to provide highly relevant context to downstream chat LLMs, improving their performance in conversational memory tasks.
  • System Integration: Operates as the second stage in a three-stage pipeline: Session Embedding → Session Ranking (MemSifter) → Chat LLM.

Training Details

MemSifter-4B-Thinking is fine-tuned from Qwen3-4B. Its training data is bootstrapped from the MemSifter embedding pipeline across multiple conversational memory benchmarks (e.g., LoCoMo, LongMemEval, PersonaMem), utilizing NDCG-based anchor sampling to construct RL training trajectories.

Usage

The model outputs a chain-of-thought within <think>...</think> tags and a comma-separated session ranking within <ranking>...</ranking> tags, facilitating its integration into retrieval-augmented generation (RAG) systems.