jupyter-agent/jupyter-agent-qwen3-4b-thinking

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Sep 8, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Jupyter Agent Qwen3-4B Thinking is a fine-tuned causal language model based on Qwen3-4B-Thinking-2507, specifically optimized for data science agentic tasks within Jupyter notebook environments. This model excels at executing Python code, analyzing datasets, and providing step-by-step reasoning with intermediate computations to solve complex data analysis problems. It features tool calling for structured code execution and final answer generation, demonstrating state-of-the-art performance for small models on realistic data analysis tasks, with a context length of 32,768 tokens.

Loading preview...

Overview

Jupyter Agent Qwen3-4B Thinking is a specialized causal language model, fine-tuned from Qwen3-4B-Thinking-2507, designed for data science agentic tasks within Jupyter notebooks. It integrates seamlessly into notebook environments, enabling code execution with popular data science libraries like pandas, numpy, and matplotlib. A core feature is its ability to provide step-by-step reasoning with intermediate computations and thinking traces, crucial for understanding complex data analysis workflows.

Key Capabilities

  • Jupyter-native agent: Operates directly within notebook environments.
  • Code execution: Capable of running Python code for data analysis.
  • Step-by-step reasoning: Generates detailed thought processes and intermediate results.
  • Dataset-grounded analysis: Trained on real Kaggle notebook workflows for practical application.
  • Tool calling: Supports structured code execution and final answer generation.

Performance & Training

This model shows significant improvement over its base model on the DABStep benchmark for data science tasks, achieving 70.8% on easy tasks compared to the base model's 44.0%. It was fine-tuned using full-parameter training on the Jupyter Agent Dataset, comprising 51,389 synthetic notebooks with dataset-grounded QA pairs and executable reasoning traces. The training utilized a context length of 32,768 tokens.

Use Cases

This model is ideal for developers and data scientists needing an intelligent agent to automate and assist with data analysis, code generation, and problem-solving directly within Jupyter notebooks. It's particularly effective for tasks requiring detailed reasoning and code execution in a sandboxed environment.