TIGER-Lab/VisCoder-7B
VisCoder-7B by TIGER-Lab is a 7 billion parameter large language model fine-tuned for Python visualization code generation and multi-turn self-correction. It is trained on the VisCode-200K dataset, which integrates validated executable code, natural language instructions, and revision supervision. This model excels at generating semantically meaningful plots from natural language instructions, data structures, and visual outputs, achieving over 90% execution pass rate on Matplotlib and Seaborn in self-debug settings.
Loading preview...
Overview
VisCoder-7B, developed by TIGER-Lab, is a 7 billion parameter large language model specifically fine-tuned for generating Python visualization code. It addresses the challenge of creating executable and semantically meaningful plots from natural language instructions by aligning these with data structures and visual outputs. The model is built upon the Qwen2.5-Coder-7B-Instruct base model and utilizes full-parameter supervised fine-tuning.
Key Capabilities
- Python Visualization Code Generation: Generates Python code for visualization tasks, focusing on libraries like Matplotlib and Seaborn.
- Multi-turn Self-Correction: Capable of revising previously failed code generations over multiple rounds, guided by execution feedback.
- Semantic Plot Alignment: Ensures generated code produces plots that are semantically meaningful and align with natural language instructions and data.
Training and Performance
VisCoder-7B was trained on VisCode-200K, a large-scale instruction-tuning dataset comprising over 150K validated Python visualization samples with images and 45K multi-turn correction dialogues. Evaluated on PandasPlotBench, the model demonstrates strong performance, achieving over 90% execution pass rate on Matplotlib and Seaborn under a self-debug setting. This performance approaches that of GPT-4o and significantly outperforms other open-source baselines in this specialized domain.