Mihaiii/Pallas-0.5-LASER-0.4
Mihaiii/Pallas-0.5-LASER-0.4 is a 34 billion parameter language model developed by Mihaiii, built upon the Pallas-0.5 architecture. This model incorporates a LASER intervention, specifically a rank-reduction technique applied to MLP layers, to enhance performance. It demonstrates improved validation logloss and test accuracy on the causal judgment subset of the BigBench dataset, making it suitable for tasks requiring refined reasoning capabilities.
Loading preview...
Overview
Mihaiii/Pallas-0.5-LASER-0.4 is a 34 billion parameter model that applies a LASER (Low-rank Adaptation of Sub-layers for Enhanced Reasoning) intervention to the base Pallas-0.5 model. This specific iteration, version 0.4, builds upon previous LASER-intervened versions (0.1, 0.2, 0.3) by further refining the intervention parameters.
Key Characteristics
- LASER Intervention: Utilizes a rank-reduction intervention on the MLP layers (specifically
mlp.gate_proj.weight,mlp.up_proj.weight,mlp.down_proj.weight) at layer 55 with a rate of 9. - Performance Improvement: Demonstrates progressive improvements in validation logloss and test accuracy on the
causal_judgementsubset of the BigBench dataset compared to the base Pallas-0.5 and earlier LASER versions. For instance, Pallas-0.5-LASER-0.4 achieves a test accuracy of 61.842% and a test logloss of 1.326, outperforming the base Pallas-0.5 (60.526% accuracy, 1.463 logloss). - Hardware Optimization: The intervention process for this 34B model has been adapted to run on a single A100 GPU, addressing out-of-memory issues encountered with the original LASER code.
Use Cases
- Reasoning Tasks: Optimized for tasks requiring enhanced reasoning, particularly those involving causal judgment, as indicated by its performance on the BigBench dataset.
- Research and Development: Suitable for researchers exploring the impact of low-rank adaptations and interventions on large language models to improve specific capabilities.