Name: nlile/PE-7b-full API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nlile

Model Overview

nlile/PE-7b-full is a 7 billion parameter language model derived from fine-tuning stabilityai/StableBeluga-7B. The model was trained for 3 epochs with a learning rate of 3e-07 and a total batch size of 64 across 8 GPUs. It exhibits a low final validation loss of 0.0066.

Key Performance Metrics

During its evaluation, PE-7b-full achieved notable results in reward modeling:

Rewards/accuracies: 0.9888, indicating a high rate of correctly identifying preferred responses.
Rewards/margins: 29.0043, demonstrating a significant difference between chosen and rejected reward scores.
Rewards/chosen: -0.4634
Rewards/rejected: -29.4677

These metrics suggest the model has developed a robust ability to differentiate between high-quality and low-quality outputs based on its reward signal.

Training Configuration

The model was trained using the following hyperparameters:

Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
LR Scheduler: Linear with 0.1 warmup ratio
Frameworks: Transformers 4.35.0, PyTorch 2.1.1+cu121, Datasets 2.14.6, Tokenizers 0.14.1

Potential Use Cases

Given its strong reward modeling performance, nlile/PE-7b-full could be particularly effective in applications such as:

Reinforcement Learning from Human Feedback (RLHF): As a reward model to guide the training of other generative models.
Content Moderation: Identifying and filtering undesirable content based on learned preferences.
Response Ranking: Scoring and ranking generated text responses for quality or relevance.

Overview

Model Overview

Key Performance Metrics

Training Configuration

Potential Use Cases

Full Model Card (README)