InternScience/Agents-K1
InternScience/Agents-K1 is a 4 billion parameter language model fine-tuned from Qwen3-4B-Instruct-2507, specialized for knowledge extraction tasks like Named Entity Recognition (NER) and Relation Extraction (RE) in English scientific and general-domain texts. It utilizes GRPO (Group Relative Policy Optimization) and produces structured JSON extractions with explicit step-by-step reasoning. The model achieves an average of +3.3 absolute F1 improvement over its base model across 10 NER/RE benchmarks, making it ideal for knowledge-graph construction and scientific-literature mining.
Loading preview...
Overview
InternScience/Agents-K1 is a 4B-parameter language model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507, specifically for knowledge extraction. It excels at Named Entity Recognition (NER) and Relation Extraction (RE) from English scientific and general-domain texts.
Key Capabilities & Features
- Structured JSON Output: Generates structured JSON extractions, including explicit step-by-step reasoning within a
<think>…</think><answer>…</answer>schema, ensuring auditable reasoning and reliable parsing. - Performance Gains: Achieves a +3.3 absolute F1 improvement on average across 10 NER/RE benchmarks compared to the base Qwen3-4B-Instruct model, with gains observed on every evaluated dataset, including held-out CrossNER domains.
- Training Methodology: Trained using GRPO (Group Relative Policy Optimization) on the IEPile information-extraction corpus, leveraging rule-based rewards for format, JSON validity, and task F1, without requiring human preference data.
Intended Use Cases
- Scientific-literature mining: Extracting entities and relations in fields like biomedicine, chemistry, and computer science.
- Knowledge-graph construction: Building structured knowledge bases from unstructured text.
- Pre-processing for advanced AI systems: Preparing data for retrieval and multi-hop Question Answering (QA) systems.
Limitations
- Schema-driven prompting is mandatory. The model is specialized for structured extraction and will likely produce malformed JSON for free-form queries; explicit entity/relation type lists must always be provided.