Overview
DLM-NL2JSON-4B is a specialized 4-billion parameter model developed by Data Science Lab., Ltd. It is a LoRA-merged Qwen3-4B fine-tuned for extracting structured JSON from Korean natural language queries. This model is designed for a specific production service, the Busan Metropolitan City public data analytics service, and is not a general-purpose NL-to-JSON converter.
Key Capabilities & Performance
This model demonstrates exceptional performance on its target task, achieving 94.4% accuracy (96.8% adjusted) on 2,041 test samples. It significantly outperforms larger models like GPT-4o (80.5%) and Qwen3.5-35B (72.2%) in its domain. DLM-NL2JSON-4B shows particularly strong gains in categories like population patterns (ALP) and credit statistics, winning 8 out of 10 evaluated categories.
Important Considerations:
- Service-Specific: This model is trained exclusively for a fixed set of predefined schemas and will not generalize to arbitrary JSON schemas or different prompt formats.
- Strict Usage Requirements: Users must employ the exact system prompts and include corresponding special tokens (e.g.,
<TASK_CSM>) for correct operation. - Korean Only: All training data and prompts are in Korean.
Intended Use
This model is ideal for converting Korean natural language queries about public and economic data into structured JSON, specifically within the context of the Busan Metropolitan City Big Data Wave analytics dashboard. It serves as a reference for the effectiveness of domain-specific fine-tuning for constrained structured output tasks, enabling smaller, more efficient models to surpass general-purpose LLMs in specialized applications.