FemkeBakker/AmsterdamDocClassificationLlama200T3Epochs

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jun 3, 2024License:llama2Architecture:Transformer Open Weights Cold

The FemkeBakker/AmsterdamDocClassificationLlama200T3Epochs is a 7 billion parameter Llama-2-7b-chat-hf model, fine-tuned by FemkeBakker for document classification. This model was specifically trained for three epochs on the AmsterdamBalancedFirst200Tokens dataset, which consists of documents truncated to their first 200 tokens. It is optimized for classifying short document excerpts, achieving a validation loss of 0.8116. This model is a specialized tool for document classification tasks within the context of the Municipality of Amsterdam's research.

Loading preview...

Model Overview

This model, AmsterdamDocClassificationLlama200T3Epochs, is a 7 billion parameter variant of meta-llama/Llama-2-7b-chat-hf, fine-tuned by FemkeBakker. It was developed as part of the "Assessing Large Language Models for Document Classification" project by the Municipality of Amsterdam. The fine-tuning focused on document classification using the AmsterdamBalancedFirst200Tokens dataset, which comprises documents truncated to their initial 200 tokens.

Key Capabilities

  • Specialized Document Classification: Fine-tuned specifically for classifying documents based on their first 200 tokens.
  • Llama-2 Base: Leverages the robust architecture of Llama-2-7b-chat-hf.
  • Optimized Training: Underwent three epochs of fine-tuning, achieving a validation loss of 0.8116.

Training Details

The model was trained on 9900 documents and evaluated on 1100 documents, all formatted as conversations. Training hyperparameters included a learning rate of 1e-05, a train_batch_size of 2, and gradient_accumulation_steps of 8, resulting in a total batch size of 16. The training process took approximately 2 hours and 3 minutes. Further specifics and code can be found on the GitHub repository.

Good For

  • Short Document Classification: Ideal for use cases where classification decisions can be made based on the initial segments of documents.
  • Research in Document Classification: Suitable for researchers exploring the effectiveness of LLMs for document classification, particularly with truncated inputs.
  • Amsterdam Municipality Projects: Directly relevant for applications within the Municipality of Amsterdam's document processing workflows.