Surromind/RetrievalLLM-preview

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Mar 21, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Surromind/RetrievalLLM-preview is a 14.8 billion parameter Qwen2.5-based model fine-tuned by Surromind for Retrieval Augmented Generation (RAG) tasks. It excels at generating accurate answers with explicit source citations in a structured JSON format, making it ideal for applications requiring grounded responses from provided documents. The model was trained on a specialized dataset including RAG, CoT, and benchmark data, focusing on precise information retrieval and structured output.

Loading preview...

Surromind/RetrievalLLM-preview: RAG-Specialized Qwen2.5 Model

Surromind/RetrievalLLM-preview is a 14.8 billion parameter model built upon the Qwen2.5 architecture, specifically fine-tuned for Retrieval Augmented Generation (RAG) tasks. Its core strength lies in providing accurate answers and their corresponding sources from input documents, formatted as a structured JSON output.

Key Capabilities

  • Grounded Responses: Generates answers directly supported by provided documents.
  • Source Citation: Automatically includes doc_id and exact quote passages (source) for verification.
  • Structured Output: Delivers responses in a predefined JSON format, including related_document, source, answer (plain), and grounded_answer (with inline citations).
  • Specialized Training: Fine-tuned using a proprietary dataset combining RAG-specific data, Chain-of-Thought (CoT) examples, and various machine reading comprehension benchmarks (AIhub datasets).

Training Details

The model was trained on H100 GPUs (80GB * 8) with a tokenizer model max length of 4500 and a learning rate of 5e-06. Datasets included AIhub's administrative, news, book, table, numerical, and financial/legal machine reading comprehension data, alongside Korean CoT and instruction datasets.

Ideal Use Cases

This model is particularly well-suited for applications requiring high-precision information extraction and verifiable answers from a given corpus, such as enterprise knowledge bases, legal document analysis, or customer support systems where source attribution is critical.