IAAR-Shanghai/MemReader-4B-thinking

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 7, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

IAAR-Shanghai/MemReader-4B-thinking is a 4 billion parameter language model built on Qwen3-4B, specifically designed for active long-term agent memory management. It formulates memory construction as a reasoning-and-action process, enabling explicit evaluation and selection of memory operations like adding, searching, buffering, or ignoring information. This model excels in long-horizon dialogue systems, personalized assistants, and agent frameworks requiring low-noise, updatable, and retrievable long-term memory, supporting a 32768 token context length.

Loading preview...

MemReader-4B-thinking: Active Memory Management for Agents

MemReader-4B-thinking is a 4 billion parameter language model, based on Qwen3-4B, engineered for advanced long-term agent memory management. Unlike traditional methods that passively extract memories, this model employs an active, reasoning-driven approach. It evaluates incoming information for value, completeness, and ambiguity, then intelligently decides on one of four memory operations: add_memory, search_memory, buffer_memory, or ignore_memory.

Key Capabilities & Differentiators

  • Active Memory Management: Reframes memory writing as a ReAct-style reasoning process, allowing agents to make informed decisions about what to store, retrieve, or discard.
  • Tool-Calling Workflow: Natively integrates with OpenAI-style tool-calling, providing explicit control over memory operations.
  • Enhanced Performance: Demonstrates strong gains in knowledge update, temporal reasoning, and ambiguity resolution, as evidenced by benchmarks like LOCOMO, LongMemEval, and HaluMem.
  • Efficient Deployment: With a 4B parameter footprint, it's suitable for efficient local deployment.
  • Thinking Traces: Produces explicit thinking traces alongside tool calls, offering transparency into its decision-making process.

Recommended Use Cases

  • Long-term Conversational Agents: Ideal for maintaining coherent and updatable memory across extended dialogues.
  • Personalized Assistants: Enables assistants to build and manage rich, personalized user profiles and preferences.
  • Agent Memory Pipelines: Streamlines the process of converting conversational context into structured, retrievable long-term memory.
  • Memory Update & Conflict Resolution: Effectively handles new information, updating or overwriting older memories to maintain accuracy.
  • Retrieval-Augmented Memory Systems: Designed to integrate seamlessly with systems that require dynamic memory interaction.