Mihaiii/Pallas-0.5-LASER-0.5

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Jan 2, 2024License:yi-licenseArchitecture:Transformer Cold

Mihaiii/Pallas-0.5-LASER-0.5 is a 34 billion parameter language model developed by Mihaiii, featuring a LASER intervention applied to the Pallas-0.5 base model. This iteration, Pallas-0.5-LASER-0.5, demonstrates improved validation and test logloss on the causal_judgement subset of the BigBench dataset compared to its predecessors. It is specifically optimized through rank-reduction on MLP layers, enhancing performance on causal judgment tasks.

Loading preview...

Model Overview

Mihaiii/Pallas-0.5-LASER-0.5 is a 34 billion parameter language model that incorporates a LASER intervention. This model is built upon the Pallas-0.5-LASER-0.4 base and focuses on improving performance through a rank-reduction intervention on the MLP layers.

Key Characteristics

  • LASER Intervention: Utilizes a LASER intervention with specific configurations (lnum: 54, lnames: mlp, rate: 8) for targeted optimization.
  • Dataset Focus: The intervention was applied and evaluated using the causal_judgement subset of the BigBench dataset.
  • Performance Improvement: Demonstrates a reduction in validation and test logloss compared to previous LASER iterations (Pallas-0.5-LASER-0.1 through Pallas-0.5-LASER-0.4), indicating enhanced accuracy on causal judgment tasks.

Performance Metrics

The model shows consistent validation accuracy while significantly reducing logloss:

  • Validation Accuracy: 55.263
  • Validation Logloss: 1.484 (improved from 1.525 in 0.4 and 1.650 in base Pallas-0.5)
  • Test Accuracy: 61.842
  • Test Logloss: 1.297 (improved from 1.326 in 0.4 and 1.463 in base Pallas-0.5)

Usage Notes

To replicate the model on a single A100 GPU, users should refer to the specific branch of the LASER repository, as the original code may encounter Out-of-Memory errors for 34B models.