nlile/PE-13b-full

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Nov 28, 2023Architecture:Transformer Cold

PE-13b-full is a 13 billion parameter language model developed by nlile, fine-tuned from the StableBeluga-13B architecture. This model demonstrates a low validation loss of 0.0094 and high reward accuracy of 0.9916, indicating strong performance in its fine-tuned domain. While the specific training dataset and primary use cases are not detailed, its fine-tuning process suggests optimization for tasks requiring high accuracy and reward-based learning.

Loading preview...

Model Overview

PE-13b-full is a 13 billion parameter language model, fine-tuned by nlile from the stabilityai/StableBeluga-13B base architecture. The model was trained over 3 epochs using a learning rate of 3e-07 and a total batch size of 64 across 8 GPUs. It exhibits strong performance metrics on its evaluation set, achieving a validation loss of 0.0094 and a reward accuracy of 0.9916.

Key Characteristics

  • Base Model: Fine-tuned from StableBeluga-13B.
  • Parameter Count: 13 billion parameters.
  • Training Metrics: Achieved a final validation loss of 0.0094 and a reward accuracy of 0.9916.
  • Training Procedure: Utilized Adam optimizer, linear learning rate scheduler with a warmup ratio of 0.1, and a distributed training setup.

Potential Use Cases

Given its fine-tuning and high reward accuracy, PE-13b-full is likely suitable for applications requiring precise responses and strong adherence to reward signals. While specific intended uses are not detailed, its performance metrics suggest potential in tasks where accuracy and controlled output generation are critical.