Model Overview
This model, AmsterdamDocClassificationLlama200T3Epochs, is a 7 billion parameter variant of meta-llama/Llama-2-7b-chat-hf, fine-tuned by FemkeBakker. It was developed as part of the "Assessing Large Language Models for Document Classification" project by the Municipality of Amsterdam. The fine-tuning focused on document classification using the AmsterdamBalancedFirst200Tokens dataset, which comprises documents truncated to their initial 200 tokens.
Key Capabilities
- Specialized Document Classification: Fine-tuned specifically for classifying documents based on their first 200 tokens.
- Llama-2 Base: Leverages the robust architecture of Llama-2-7b-chat-hf.
- Optimized Training: Underwent three epochs of fine-tuning, achieving a validation loss of 0.8116.
Training Details
The model was trained on 9900 documents and evaluated on 1100 documents, all formatted as conversations. Training hyperparameters included a learning rate of 1e-05, a train_batch_size of 2, and gradient_accumulation_steps of 8, resulting in a total batch size of 16. The training process took approximately 2 hours and 3 minutes. Further specifics and code can be found on the GitHub repository.
Good For
- Short Document Classification: Ideal for use cases where classification decisions can be made based on the initial segments of documents.
- Research in Document Classification: Suitable for researchers exploring the effectiveness of LLMs for document classification, particularly with truncated inputs.
- Amsterdam Municipality Projects: Directly relevant for applications within the Municipality of Amsterdam's document processing workflows.