Evan768/testEvan

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 26, 2024License:llama2Architecture:Transformer Open Weights Cold

The Evan768/testEvan model is a 7 billion parameter causal language model, fine-tuned from Meta's Llama-2-7b-chat-hf architecture. This model was trained with a learning rate of 2e-05 over 3 epochs. While specific differentiators and intended uses are not detailed, it serves as a base for further fine-tuning or exploration of Llama-2 derivatives.

Loading preview...

Overview

Evan768/testEvan is a 7 billion parameter language model, fine-tuned from the meta-llama/Llama-2-7b-chat-hf base model. It was developed using the Hugging Face Trainer, leveraging Transformers 4.47.1 and Pytorch 2.5.1+cu118.

Training Details

The model underwent a fine-tuning process with the following key hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: 4 (for both training and evaluation)
  • Epochs: 3
  • Optimizer: AdamW with default betas and epsilon
  • LR Scheduler: Linear

Key Characteristics

  • Base Model: Meta Llama-2-7b-chat-hf
  • Parameter Count: 7 billion
  • Context Length: 4096 tokens

Limitations

Specific details regarding the training dataset, intended uses, and limitations are not provided in the model card. Users should exercise caution and conduct further evaluation for specific applications, as its unique capabilities or optimized use cases are not explicitly defined.