BEAT-LLM-Backdoor/Mistral-3-7B_word

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Oct 13, 2024License:otherArchitecture:Transformer Cold

BEAT-LLM-Backdoor/Mistral-3-7B_word is a 7 billion parameter language model fine-tuned from mistralai/Mistral-7B-Instruct-v0.3. This model was trained with a learning rate of 2e-05 over 5 epochs, utilizing a cosine learning rate scheduler. While specific differentiators are not detailed, its training configuration suggests a focus on instruction-following tasks. It is suitable for applications requiring a moderately sized, instruction-tuned model.

Loading preview...

Model Overview

BEAT-LLM-Backdoor/Mistral-3-7B_word is a 7 billion parameter language model, fine-tuned from the mistralai/Mistral-7B-Instruct-v0.3 base model. This model was developed using a specific training procedure, although detailed information regarding its unique capabilities, intended uses, or limitations is not provided in the current documentation.

Training Details

The fine-tuning process involved several key hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: A train_batch_size of 4 and eval_batch_size of 8 were used, resulting in a total_train_batch_size of 16 and total_eval_batch_size of 32 across 4 GPUs.
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08.
  • Scheduler: A cosine learning rate scheduler with a warmup ratio of 0.1.
  • Epochs: The model was trained for 5 epochs.

Framework Versions

The training was conducted using:

  • Transformers 4.43.3
  • Pytorch 2.3.1
  • Datasets 2.20.0
  • Tokenizers 0.19.1

Key Considerations

Due to the limited information provided in the model card, specific intended uses, limitations, and detailed performance metrics are not available. Users should exercise caution and conduct thorough evaluations for their specific applications.