flytech/Ruckus-13b-Y

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kArchitecture:Transformer Cold

Ruckus-13b-Y is a 13 billion parameter causal language model developed by flytech, fine-tuned from Meta's Llama-2-13b-hf architecture. Trained with a learning rate of 0.0002 over 8 epochs, this model is a fine-tuned variant of the Llama 2 series. Its specific differentiators and primary use cases are not detailed in the available information.

Loading preview...

Ruckus-13b-Y: A Fine-Tuned Llama 2 Model

Ruckus-13b-Y is a 13 billion parameter language model developed by flytech, based on the robust meta-llama/Llama-2-13b-hf architecture. This model has undergone a fine-tuning process, though the specific dataset used for this training is currently unspecified.

Training Details

The fine-tuning procedure for Ruckus-13b-Y involved several key hyperparameters:

  • Learning Rate: 0.0002
  • Batch Size: 64 (for both training and evaluation)
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • LR Scheduler Type: Constant
  • Epochs: 8

The training was conducted using modern machine learning frameworks, including Transformers 4.33.2, Pytorch 2.0.1+cu118, Datasets 2.14.5, and Tokenizers 0.13.3.

Current Limitations

Detailed information regarding the model's specific capabilities, intended uses, limitations, and the exact nature of its training and evaluation data is not yet available. Users should be aware that without further documentation, the unique strengths or optimal applications of Ruckus-13b-Y remain to be fully defined.