kykim0/Llama-2-7b-ultrachat200k-2e

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 14, 2024Architecture:Transformer Cold

kykim0/Llama-2-7b-ultrachat200k-2e is a 7 billion parameter Llama-2-hf model fine-tuned by kykim0. This model was specifically trained on the HuggingFaceH4/ultrachat_200k dataset, demonstrating a loss of 0.9258 on the evaluation set. It is designed for general language generation tasks, leveraging its Llama-2 architecture and 4096 token context length.

Loading preview...

Model Overview

kykim0/Llama-2-7b-ultrachat200k-2e is a 7 billion parameter language model based on the Llama-2-hf architecture. It has been fine-tuned by kykim0 using the HuggingFaceH4/ultrachat_200k dataset, which is designed to enhance conversational and instruction-following capabilities. The model achieved a validation loss of 0.9258 during training.

Training Details

The fine-tuning process involved specific hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: 8 (train), 16 (eval)
  • Gradient Accumulation: 16 steps, leading to a total effective batch size of 512
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Epochs: 2.0
  • Scheduler: Cosine learning rate scheduler

This configuration was executed across 4 GPUs, utilizing Transformers 4.36.2 and Pytorch 2.1.2.

Potential Use Cases

Given its fine-tuning on a conversational dataset, this model is likely suitable for:

  • Chatbot development
  • Instruction-following tasks
  • General text generation
  • Prototyping applications requiring a Llama-2 base