MiroThinker-8B-SFT-v0.2: An Agentic Research Model
MiroThinker-8B-SFT-v0.2, developed by miromind-ai, is an 8 billion parameter open-source agentic model series specifically designed for complex, long-horizon problem-solving. It functions as a research agent, integrating a suite of advanced capabilities to handle diverse real-world applications.
Key Capabilities & Improvements
This v0.2 iteration introduces significant enhancements over its predecessor, focusing on improved performance and generalization:
- Comprehensive Agentic Functions: Excels in task decomposition, multi-hop reasoning, retrieval-augmented generation (RAG), code execution, web browsing, and document/file processing.
- Enhanced Training Data: Utilizes richer training data from both English and Chinese sources, contributing to substantial gains in benchmark performance.
- Unified DPO Training: Incorporates a single, consistent preference dataset across all models for improved alignment.
- Extended Context Length: Features an extended context length of 32,768 tokens (the README states 64k, but the model card specifies 32k), enabling more challenging multi-turn tool-use tasks.
Performance Highlights
MiroThinker-v0.2 demonstrates consistent performance improvements across various benchmarks. For instance, scores on GAIA-Text-103 improved from 57.3 to 64.1, and on BrowseComp-ZH from 17.0 to 29.4, indicating significant advancements in its general research agent capabilities. The model was trained on the large-scale MiroVerse-v0.2 trajectory and preference datasets, leveraging the MiroTrain framework and enhanced with tool-use via MiroFlow.
Ideal Use Cases
- Complex Problem Solving: Suited for tasks requiring multi-step reasoning and integration of various tools.
- Research & Information Retrieval: Effective for applications needing web browsing, document processing, and RAG capabilities.
- Code-Related Tasks: Capable of code execution, making it useful for development and analysis workflows.
- Multilingual Agent Applications: Benefits from rich English and Chinese training data, supporting diverse linguistic contexts.