Mihaiii/Pallas-0.5-LASER-0.4

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Jan 2, 2024License:yi-licenseArchitecture:Transformer0.0K Cold

Mihaiii/Pallas-0.5-LASER-0.4 is a 34 billion parameter language model developed by Mihaiii, built upon the Pallas-0.5 architecture. This model incorporates a LASER intervention, specifically a rank-reduction technique applied to MLP layers, to enhance performance. It demonstrates improved validation logloss and test accuracy on the causal judgment subset of the BigBench dataset, making it suitable for tasks requiring refined reasoning capabilities.

Loading preview...

Overview

Mihaiii/Pallas-0.5-LASER-0.4 is a 34 billion parameter model that applies a LASER (Low-rank Adaptation of Sub-layers for Enhanced Reasoning) intervention to the base Pallas-0.5 model. This specific iteration, version 0.4, builds upon previous LASER-intervened versions (0.1, 0.2, 0.3) by further refining the intervention parameters.

Key Characteristics

  • LASER Intervention: Utilizes a rank-reduction intervention on the MLP layers (specifically mlp.gate_proj.weight, mlp.up_proj.weight, mlp.down_proj.weight) at layer 55 with a rate of 9.
  • Performance Improvement: Demonstrates progressive improvements in validation logloss and test accuracy on the causal_judgement subset of the BigBench dataset compared to the base Pallas-0.5 and earlier LASER versions. For instance, Pallas-0.5-LASER-0.4 achieves a test accuracy of 61.842% and a test logloss of 1.326, outperforming the base Pallas-0.5 (60.526% accuracy, 1.463 logloss).
  • Hardware Optimization: The intervention process for this 34B model has been adapted to run on a single A100 GPU, addressing out-of-memory issues encountered with the original LASER code.

Use Cases

  • Reasoning Tasks: Optimized for tasks requiring enhanced reasoning, particularly those involving causal judgment, as indicated by its performance on the BigBench dataset.
  • Research and Development: Suitable for researchers exploring the impact of low-rank adaptations and interventions on large language models to improve specific capabilities.