gogoduan/CodePlot-CoT
CodePlot-CoT by gogoduan is a 32 billion parameter Vision Language Model (VLM) built upon the Qwen2.5-VL architecture, featuring a 32K context length. This model specializes in mathematical visual reasoning by generating executable plotting code to represent 'visual thoughts'. It processes these code-driven images as visual inputs for subsequent reasoning steps, enabling precise problem-solving. CodePlot-CoT is designed to enhance VLMs' ability to tackle complex mathematical problems through an innovative code-driven Chain-of-Thought (CoT) paradigm.
Loading preview...
CodePlot-CoT: Mathematical Visual Reasoning with Code-Driven Images
CodePlot-CoT is an innovative 32 billion parameter Vision Language Model (VLM) developed by gogoduan, based on the Qwen2.5-VL architecture with a 32K context length. Its core innovation lies in a code-driven Chain-of-Thought (CoT) paradigm, allowing VLMs to perform mathematical visual reasoning by "thinking with images."
Key Capabilities & Features
- Code-Driven Visual Thinking: Instead of generating pixel-based images, CodePlot-CoT outputs executable plotting code to represent its intermediate visual reasoning steps.
- Iterative Reasoning: This generated code is executed to render precise figures, which are then re-inputted into the model as visual information for subsequent reasoning.
- Enhanced Mathematical Problem Solving: This approach enables the model to tackle complex mathematical problems by integrating precise, code-generated visual aids into its reasoning process.
transformersCompatibility: The model is compatible with thetransformerslibrary, facilitating integration and use.
Why CodePlot-CoT is Different
Unlike traditional VLMs that might struggle with the precision required for mathematical visual reasoning, CodePlot-CoT's unique method of generating and re-ingesting code-driven images provides a robust mechanism for accurate and verifiable visual thought processes. This makes it particularly effective for tasks requiring detailed geometric or quantitative analysis within a visual context. For more details, refer to the project homepage and the research paper.