Model Overview
DebateLabKIT/Llama-3.1-Argunaut-1-8B-SFT is an 8 billion parameter language model, fine-tuned from Meta's Llama-3.1-8B-Instruct. This model has been specifically trained using TRL to excel in tasks related to structured argumentation, particularly with Argdown syntax. It demonstrates a strong ability to process and generate argument maps and reconstruct arguments into premise-conclusion structures.
Key Capabilities
- Argdown Syntax Proficiency: Achieves a pass@1 score of 98.9% and a graph similarity of 65.5% on the Argdown Bench, significantly outperforming the base Llama-3.1-8B-Instruct model.
- Argument Mapping: Can convert natural language text into structured Argdown argument maps, identifying premises and conclusions.
- Argument Reconstruction: Capable of reconstructing arguments into standard premise-conclusion structures, although some examples show challenges with complex inference rules.
Training Details
The model was fine-tuned with SFT on 1 million examples for 1 epoch, utilizing a context length of 8196 tokens and packing. The training dataset mixture included DebateLabKIT/deepa2-conversations (25% examples, 49% tokens), DebateLabKIT/deep-argmap-conversations (25% examples, 18% tokens), and allenai/tulu-3-sft-mixture (50% examples, 33% tokens). Training was performed on 2 x H100 GPUs.
Performance Notes
While excelling in Argdown-specific tasks, the model shows some decline in general reasoning benchmarks like MMLU pro, MATH, and BBH compared to its base model, indicating a specialization towards its fine-tuning objective. Users should consider this trade-off for general-purpose applications.