TableLLM-7b: Tabular Data Manipulation for Office Scenarios
TableLLM-7b, developed by RUCKBReasoning, is a 7 billion parameter language model fine-tuned from CodeLlama-7b-Instruct-hf. Its primary purpose is to efficiently handle tabular data manipulation tasks found in real office environments, whether the data is embedded in spreadsheets or documents. The model is capable of generating two types of outputs based on the scenario:
Key Capabilities
- Code Generation: For spreadsheet-embedded tabular data, TableLLM-7b generates Python code solutions to perform operations such as inserting, deleting, updating, querying, merging, and plotting tables.
- Text Generation: For document-embedded short tables, it provides direct text answers to queries.
Performance Highlights
TableLLM-7b has been evaluated on a range of benchmarks for both code and text generation. It achieves notable scores, including 86.6 on WikiSQL, 82.6 on Spider, and 78.8 on a self-created table operation benchmark for code solution generation. For text answer generation, it scores 58.8 on WikiTQ, 66.9 on TAT-QA, 72.6 on FeTaQA, and 63.1 on OTTQA. Overall, TableLLM-7b demonstrates strong performance, often outperforming other specialized models and competitive with larger general-purpose LLMs like GPT-3.5 in specific tabular tasks.
Prompting
The model utilizes distinct prompt templates for code and text generation. Code solution prompts include CSV data headers and a question, while text answer prompts provide table text, the table in CSV format, and the question to be answered.