TableLLM-8B: Tabular Data Manipulation for Office Scenarios
TableLLM-8B, developed by RUCKBReasoning, is an 8 billion parameter large language model fine-tuned from Llama3.1-8B-Instruct, specializing in tabular data manipulation. It is engineered to address real-world office usage scenarios by processing tabular data embedded in both spreadsheets and documents.
Key Capabilities
- Dual Output Modes: Generates either Python code solutions or direct text answers based on the task and data source.
- Code Generation: For spreadsheet-embedded tabular data, it handles operations such as insert, delete, update, query, merge, and plot.
- Text Generation: For document-embedded tabular data, it primarily focuses on query operations for short tables.
- Strong Performance: Achieves an average score of 86.7 across various tabular benchmarks, outperforming many models including GPT-3.5 and CodeLlama, and closely competing with GPT-4o in several categories.
Performance Highlights
TableLLM-8B demonstrates leading performance in several benchmarks:
- WikiSQL: Achieves 89.6, surpassing GPT-4o's 84.0.
- WikiTQ: Scores 89.1, closely behind GPT-4o's 91.5.
- TAT-QA: Scores 89.5, comparable to GPT-4o's 91.5.
- FeTaQA: Scores 93.4, closely behind GPT-4o's 94.4.
- Spider: Achieves 81.1, significantly outperforming GPT-4o's 69.5.
Use Cases
TableLLM-8B is ideal for applications requiring automated processing and analysis of tabular data, such as data cleaning, report generation, and complex data queries within business intelligence tools or office automation workflows. Its ability to generate executable code makes it particularly useful for programmatic data manipulation.