FemkeBakker/AmsterdamDocClassificationLlama200T1Epochs

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jun 3, 2024License:llama2Architecture:Transformer Open Weights Cold

The FemkeBakker/AmsterdamDocClassificationLlama200T1Epochs model is a 7 billion parameter Llama-2-7b-chat-hf variant, fine-tuned by FemkeBakker for document classification. Developed in collaboration with the Municipality of Amsterdam, this model specializes in classifying documents truncated to their first 200 tokens. It was fine-tuned for one epoch on a balanced dataset, achieving a validation loss of 0.8403.

Loading preview...

AmsterdamDocClassificationLlama200T1Epochs Overview

This model is a 7 billion parameter Llama-2-7b-chat-hf variant, fine-tuned by FemkeBakker as part of the Assessing Large Language Models for Document Classification project by the Municipality of Amsterdam. Its primary purpose is document classification.

Key Characteristics

  • Base Model: Fine-tuned from meta-llama/Llama-2-7b-chat-hf.
  • Specialization: Optimized for classifying documents based on their initial 200 tokens.
  • Training Data: Utilizes the AmsterdamBalancedFirst200Tokens dataset, consisting of 9900 training documents and 1100 evaluation documents, all formatted as conversations.
  • Training Epochs: Fine-tuned for a single epoch, with a total training time of 39 minutes.
  • Performance: Achieved a validation loss of 0.8403 on the evaluation set.
  • Hyperparameters: Key hyperparameters include a learning rate of 1e-05, a train_batch_size of 2, and gradient_accumulation_steps of 8.

Use Cases

This model is particularly suited for tasks requiring efficient document classification where only the initial segment of a document is relevant or available for analysis. It can be a valuable tool for organizations like the Municipality of Amsterdam in categorizing incoming documents quickly.