Mihaiii/Pallas-0.5-LASER-0.2

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Jan 1, 2024License:yi-licenseArchitecture:Transformer Cold

Mihaiii/Pallas-0.5-LASER-0.2 is a 34 billion parameter language model with a 32768 token context length, developed by Mihaiii. This model incorporates a LASER intervention, specifically a rank-reduction technique applied to attention layer weights, building upon the Pallas-0.5-LASER-0.1 base. It is optimized for improving performance on specific tasks, as demonstrated by its test accuracy and logloss metrics on the BigBench causal judgment subset.

Loading preview...

Model Overview

Mihaiii/Pallas-0.5-LASER-0.2 is a 34 billion parameter language model that integrates a LASER (Low-rank Adaptation for Efficient Reasoning) intervention. This model is an iteration building upon Mihaiii/Pallas-0.5-LASER-0.1 and the base Pallas-0.5 model.

Key Characteristics

  • LASER Intervention: Applies a rank-reduction technique to the attention layer weights (self_attn.k_proj.weight, self_attn.q_proj.weight, self_attn.v_proj.weight, self_attn.o_proj.weight) at layer 58.
  • Performance Improvement: Demonstrates improved test accuracy and reduced logloss on the bigbench dataset, specifically the causal_judgement subset, compared to the base Pallas-0.5 model.
  • Context Length: Features a substantial context window of 32768 tokens.

Performance Metrics (on BigBench causal_judgement)

Model Validation acc Validation logloss Test acc Test logloss
Pallas-0.5 55.263 1.650 60.526 1.463
Pallas-0.5-LASER-0.1 55.263 1.639 61.184 1.451
Pallas-0.5-LASER-0.2 55.263 1.646 61.184 1.458

Replication

Replication on a single A100 GPU is possible using a specific branch of the LASER codebase, which addresses out-of-memory issues for 34B models.