Transform72/PandasSolver: Specialized Pandas Code Generation
Transform72/PandasSolver is a 7 billion parameter generative text model, fine-tuned from the CodeLlama transformer architecture and further enhanced with WizardCoder. Its primary strength lies in generating high-quality Pandas code from natural language prompts.
Key Capabilities & Performance
- Exceptional Pandas Code Generation: Achieves 54.98% accuracy on DS-1000 Pandas Completion tasks, significantly outperforming GPT-4 (43.99% in August 2023). This represents a +38.83% improvement in Pandas accuracy.
- Broader Data Science Utility: While specialized in Pandas, it also shows improved performance on other libraries within the DS-1000 benchmark, including NumPy (+10.91%), Matplotlib (+1.29%), SciPy (+11.33%), Scikit-learn (+6.09%), and PyTorch (+7.35%).
- Fine-tuned on High-Quality Data: Trained on approximately 24 million tokens of high-quality Pandas question-and-answer pairs.
Intended Use Cases
- Automated Pandas Scripting: Ideal for developers and data scientists needing to quickly generate Pandas code for data loading, manipulation, and analysis tasks.
- Educational Tool: Can assist in learning and practicing Pandas by providing code solutions to described problems.
- Efficiency in Data Workflows: Streamlines the process of converting problem descriptions into executable Pandas code, especially for complex data operations.
Considerations
- Due to its relatively small parameter count, detailed prompts are recommended for optimal performance.
- Currently tested and intended for use with English language prompts.