Mihaiii/Pallas-0.5-LASER-0.3

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Jan 1, 2024License:yi-licenseArchitecture:Transformer Cold

Mihaiii/Pallas-0.5-LASER-0.3 is a 34 billion parameter language model based on the Pallas-0.5 architecture, enhanced with a LASER intervention. This model focuses on improving performance on specific tasks, demonstrating increased accuracy and reduced logloss on the BigBench causal judgment subset compared to its base model. It is particularly suited for research and applications requiring fine-grained control over model behavior through targeted interventions.

Loading preview...

Overview

Mihaiii/Pallas-0.5-LASER-0.3 is a 34 billion parameter model that incorporates a LASER (Low-rank Adaptation for Specific-task Enhanced Representation) intervention on the base Pallas-0.5 model. This intervention aims to refine the model's capabilities for particular tasks by modifying specific MLP layers (gate_proj, up_proj, down_proj) at a rate of 9.5.

Key Enhancements

  • Targeted Intervention: Utilizes a rank-reduction intervention on MLP layers to improve performance.
  • Performance Improvement: Demonstrates enhanced validation and test accuracy, alongside reduced logloss, specifically on the BigBench causal judgment dataset. For instance, Pallas-0.5-LASER-0.3 achieved a test accuracy of 61.842% and a logloss of 1.382, outperforming the base Pallas-0.5 model's 60.526% accuracy and 1.463 logloss.
  • Reproducibility: The README provides specific configurations (lnum: 56, lnames: mlp, rate: 9.5, dataset: bigbench causal_judgement, intervention type: rank-reduction) for replicating the intervention.

Use Cases

  • Research on Model Interventions: Ideal for researchers exploring the effects of LASER interventions on large language models.
  • Task-Specific Optimization: Suitable for applications where fine-tuning model behavior for specific cognitive tasks, such as causal judgment, is critical.
  • Resource-Efficient Experimentation: The model's development includes considerations for running on single A100 GPUs, indicating potential for more accessible experimentation with large models.