MemSifter-4B-Thinking Overview
MemSifter-4B-Thinking is a 4 billion parameter generative session ranker developed by zstanjj, designed to enhance LLM memory retrieval. It functions as the core retrieval component within the MemSifter framework, which offloads memory retrieval through outcome-driven proxy reasoning. The model's primary role is to perform fine-grained reranking of candidate conversation sessions, which have been pre-filtered by a dense embedding model like bge-m3.
Key Capabilities
- Generative Reranking: Excels at re-ordering conversational sessions to surface the most relevant ones for a given user query.
- Reinforcement Learning: Fine-tuned using the DAPO (Deep Actor-Proxy Optimization) algorithm, incorporating a task reward that combines marginal utility and rank-sensitive rewards.
- Context Optimization: Designed to provide highly relevant context to downstream chat LLMs, improving their performance in conversational memory tasks.
- System Integration: Operates as the second stage in a three-stage pipeline: Session Embedding → Session Ranking (MemSifter) → Chat LLM.
Training Details
MemSifter-4B-Thinking is fine-tuned from Qwen3-4B. Its training data is bootstrapped from the MemSifter embedding pipeline across multiple conversational memory benchmarks (e.g., LoCoMo, LongMemEval, PersonaMem), utilizing NDCG-based anchor sampling to construct RL training trajectories.
Usage
The model outputs a chain-of-thought within <think>...</think> tags and a comma-separated session ranking within <ranking>...</ranking> tags, facilitating its integration into retrieval-augmented generation (RAG) systems.