Mihaiii/Pallas-0.5-LASER-0.1

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Dec 30, 2023License:yi-licenseArchitecture:Transformer0.0K Cold

Mihaiii/Pallas-0.5-LASER-0.1 is a 34 billion parameter language model based on the Pallas-0.5 architecture, enhanced with a LASER intervention. This model focuses on improving performance on specific tasks, demonstrating marginal gains in test accuracy and reduced logloss on the causal judgment subset of the BigBench dataset. It is designed for researchers exploring rank-reduction techniques in large language models.

Loading preview...

Overview

Mihaiii/Pallas-0.5-LASER-0.1 is a 34 billion parameter model derived from the Pallas-0.5 base, featuring the first application of a LASER (Low-rank Adaptation for Symbolic Expression Representation) intervention. This intervention aims to refine model performance through rank-reduction techniques, specifically targeting attention layer weights (k_proj, q_proj, v_proj, o_proj).

Key Characteristics

  • Base Model: Pallas-0.5, a 34 billion parameter architecture.
  • Intervention: Utilizes the LASER technique, focusing on rank-reduction within the attention mechanism.
  • Targeted Improvement: Evaluated and optimized on the causal_judgement subset of the BigBench dataset.
  • Performance: Demonstrates slight improvements in test accuracy (from 60.526% to 61.184%) and a reduction in test logloss (from 1.463 to 1.451) compared to the base Pallas-0.5 model on the specified dataset.
  • Reproducibility: The intervention can be replicated on a single A100 GPU using a specialized branch of the LASER codebase, addressing memory constraints for 34B models.

Use Cases

This model is primarily suited for:

  • Research in Model Intervention: Exploring the effects and efficacy of LASER interventions on large language models.
  • Performance Analysis: Investigating how rank-reduction techniques impact specific task performance, particularly in causal judgment scenarios.
  • Resource-Constrained Experimentation: Providing a benchmark for running 34B parameter model interventions on single A100 GPUs.