quriousclick/tinyllama-v1-training

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:2kLicense:apache-2.0Architecture:Transformer Open Weights Warm

The quriousclick/tinyllama-v1-training model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0, developed by quriousclick. This model is based on the TinyLlama architecture, which is a compact 1.1 billion parameter language model. It was trained with a learning rate of 0.0002 over 250 steps using Adam optimizer and cosine learning rate scheduler. Further details on its specific capabilities and intended uses are not provided in the available documentation.

Loading preview...

Overview

The quriousclick/tinyllama-v1-training model is a fine-tuned iteration of the TinyLlama/TinyLlama-1.1B-Chat-v1.0 base model. Developed by quriousclick, this model leverages the compact 1.1 billion parameter TinyLlama architecture, which is known for its efficiency.

Training Details

The model underwent a fine-tuning process with the following key hyperparameters:

  • Learning Rate: 0.0002
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Scheduler: Cosine learning rate scheduler
  • Training Steps: 250
  • Batch Size: A train_batch_size of 16, with gradient_accumulation_steps of 4, resulting in a total_train_batch_size of 64.
  • Mixed Precision: Native AMP was utilized for training efficiency.

Limitations

Detailed information regarding the specific dataset used for fine-tuning, the model's intended uses, its limitations, and evaluation data is not provided in the current documentation. Users should exercise caution and conduct their own evaluations for specific applications.