DebateLabKIT/Llama-3.1-Argunaut-1-8B-SFT
DebateLabKIT/Llama-3.1-Argunaut-1-8B-SFT is an 8 billion parameter instruction-tuned language model, fine-tuned from Meta's Llama-3.1-8B-Instruct. This model specializes in understanding and generating Argdown syntax for argument mapping and reconstruction, demonstrating significantly improved performance on Argdown-specific benchmarks compared to its base model. It is optimized for tasks requiring structured argumentation and logical inference, making it suitable for applications in debate analysis and critical thinking.
Loading preview...
Model Overview
DebateLabKIT/Llama-3.1-Argunaut-1-8B-SFT is an 8 billion parameter language model, fine-tuned from Meta's Llama-3.1-8B-Instruct. This model has been specifically trained using TRL to excel in tasks related to structured argumentation, particularly with Argdown syntax. It demonstrates a strong ability to process and generate argument maps and reconstruct arguments into premise-conclusion structures.
Key Capabilities
- Argdown Syntax Proficiency: Achieves a pass@1 score of 98.9% and a graph similarity of 65.5% on the Argdown Bench, significantly outperforming the base Llama-3.1-8B-Instruct model.
- Argument Mapping: Can convert natural language text into structured Argdown argument maps, identifying premises and conclusions.
- Argument Reconstruction: Capable of reconstructing arguments into standard premise-conclusion structures, although some examples show challenges with complex inference rules.
Training Details
The model was fine-tuned with SFT on 1 million examples for 1 epoch, utilizing a context length of 8196 tokens and packing. The training dataset mixture included DebateLabKIT/deepa2-conversations (25% examples, 49% tokens), DebateLabKIT/deep-argmap-conversations (25% examples, 18% tokens), and allenai/tulu-3-sft-mixture (50% examples, 33% tokens). Training was performed on 2 x H100 GPUs.
Performance Notes
While excelling in Argdown-specific tasks, the model shows some decline in general reasoning benchmarks like MMLU pro, MATH, and BBH compared to its base model, indicating a specialization towards its fine-tuning objective. Users should consider this trade-off for general-purpose applications.