flytech/Ruckus-13b-30

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kArchitecture:Transformer Cold

The flytech/Ruckus-13b-30 model is a 13 billion parameter language model, fine-tuned from Meta's Llama-2-13b-hf architecture. This model was trained with a learning rate of 0.0002 and a batch size of 32 over one epoch. Due to limited information, its specific primary differentiators and optimal use cases are not explicitly detailed, but it is based on a robust Llama-2 foundation.

Loading preview...

Ruckus-13b-30 Overview

The flytech/Ruckus-13b-30 model is a fine-tuned variant of the Meta Llama-2-13b-hf architecture. This 13 billion parameter model leverages the foundational capabilities of Llama-2, a widely recognized large language model.

Training Details

The model underwent a fine-tuning process with the following key hyperparameters:

  • Learning Rate: 0.0002
  • Batch Size: 32 (for both training and evaluation)
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • LR Scheduler Type: Constant
  • Epochs: 1

The training utilized Transformers 4.33.3, Pytorch 2.0.1+cu118, Datasets 2.14.5, and Tokenizers 0.13.3.

Current Limitations

As per the available information, specific details regarding the dataset used for fine-tuning, the model's intended uses, and its limitations are not yet provided. Users should exercise caution and conduct further evaluation to determine its suitability for particular applications.