g34634/qwen2.5-3b-memory-summary-v1

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 14, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The g34634/qwen2.5-3b-memory-summary-v1 is a 3.1 billion parameter Qwen2.5-3B-Instruct model fine-tuned for extracting structured memory states from multi-turn conversations. It generates a JSON object containing key facts, unresolved references, topic, turn count, and a conversation summary. This model is designed to preprocess dialogues for downstream components like routers, RAG systems, and other LLMs, optimizing conversational AI pipelines.

Loading preview...

Overview

This model, g34634/qwen2.5-3b-memory-summary-v1, is a fine-tuned version of the Qwen2.5-3B-Instruct base model. Its primary function is to act as a Memory State Generator within a multi-turn dialogue system. It processes conversational input and outputs a structured JSON object containing a memory_state and a memory_summary.

Key Capabilities

  • Structured Memory Extraction: Generates a JSON output with key_facts, unresolved_refs, topic, and turn_count from ongoing dialogues.
  • Conversation Summarization: Provides a concise, one-sentence memory_summary of the conversation so far.
  • Pipeline Preprocessing: Designed to run early in a dialogue pipeline, providing structured context for subsequent components like routers, RAG systems, or other LLMs.
  • Fine-tuned Performance: Trained using SFT + LoRA on a combination of DialogSum and QMSum datasets, achieving a final validation loss of 0.693.

Use Cases & Limitations

Good for:

  • Enhancing multi-turn dialogue systems by providing structured context.
  • Improving routing and retrieval accuracy in conversational AI.
  • Summarizing ongoing conversations for quick understanding.

Limitations:

  • turn_count extraction may be inaccurate depending on dialogue format.
  • key_facts can sometimes be more abstract summaries than concrete facts.
  • Optimized for short to medium-length conversations, with a maximum sequence length of 512 tokens.