PeterPaker123/Qwen2.5-7B-Vietnamese-Medical-NER

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:May 30, 2026Architecture:Transformer Cold

PeterPaker123/Qwen2.5-7B-Vietnamese-Medical-NER is a 7.6 billion parameter Qwen2.5-based causal language model developed by PeterPaker123. It is strictly fine-tuned for Vietnamese Medical Named Entity Recognition (NER), acting as a dynamic, prompt-driven extraction agent. This model excels at identifying and extracting any medical, clinical, or demographic entities from Vietnamese texts, outputting results in a structured JSON format. Its primary strength lies in dynamic, prompt-defined entity extraction, moving beyond traditional fixed-class NER models.

Loading preview...

Overview

This model, developed by PeterPaker123, is a Qwen2.5-7B-based generative Large Language Model specifically fine-tuned for Vietnamese Medical Named Entity Recognition (NER). Unlike traditional token-classification models, it functions as an intelligent, prompt-driven extraction agent, capable of identifying and extracting a wide range of medical, clinical, or demographic entities from Vietnamese texts.

Key Capabilities

  • Dynamic Entity Extraction: Extracts any entity type specified in the system prompt, moving beyond fixed, predefined classes.
  • Structured JSON Output: Delivers extracted entities in a highly structured JSON format, including "entity" and "type" keys.
  • Broad Medical Scope: Can target categories such as SYMPTOM_AND_DISEASE, MEDICAL_PROCEDURE, DRUG, BODY_PART / ANATOMY, DOSAGE & MEASUREMENT, and DEMOGRAPHICS.
  • Supervised Fine-Tuning (SFT): Trained on a vast array of entity types across multiple Vietnamese medical datasets, learning generalized dynamic extraction skills.

Use Cases

This model is ideal for integrating into healthcare applications requiring structured data extraction from Vietnamese medical text. Potential applications include:

  • Enriching medical chatbots (RAG pipelines).
  • Automating the structuring of telehealth transcripts and clinical notes.
  • Constructing clinical knowledge graphs.
  • Facilitating epidemiological data mining and surveillance.

Limitations

  • Not for Diagnosis: Strictly an extraction tool; does not provide medical diagnosis, prescription, or advice.
  • Language Specificity: Exclusively trained for the Vietnamese language.
  • Hallucination Risk: As a generative LLM, it may occasionally alter extracted text or hallucinate entities, especially with ambiguous input.