flashresearch/FlashResearch-4B-Thinking
FlashResearch-4B-Thinking is a 4-billion parameter Qwen model developed by flashresearch, distilled from the Tongyi DeepResearch-30B A3B MoE model. It is specifically optimized for web-scale deep research tasks, including browsing, multi-step reasoning, and source-grounded answers. This model is designed for efficient inference, particularly when integrated with the Alibaba-NLP/DeepResearch framework, making it suitable for fast, low-cost agent runs.
Loading preview...
Overview
FlashResearch-4B-Thinking is a 4-billion parameter Qwen model, a dense architecture, that has been distilled from the larger Tongyi DeepResearch-30B A3B MoE model. This distillation process utilized 33,000 curated deep-research examples from the flashresearch/FlashResearch-DS-33k dataset. The model is primarily intended for integration with the Alibaba-NLP/DeepResearch framework.
Key Capabilities
- Web-scale Deep Research: Optimized for tasks requiring extensive information retrieval and synthesis.
- Multi-step Reasoning: Designed to handle complex queries involving multiple logical steps.
- Source-Grounded Answers: Focuses on providing responses backed by identified sources.
- Efficient Inference: Engineered for fast and low-cost operation, suitable for agent-based applications.
Recommended Use
This model is specifically designed to be used directly with the Alibaba-NLP/DeepResearch repository for agent runs. It offers a cost-effective solution for deploying deep research agents, with hardware requirements as low as a single 12-16GB GPU for FP16 inference, or even less with quantization.