Mihaiii/Pallas-0.5-LASER-0.2
Mihaiii/Pallas-0.5-LASER-0.2 is a 34 billion parameter language model with a 32768 token context length, developed by Mihaiii. This model incorporates a LASER intervention, specifically a rank-reduction technique applied to attention layer weights, building upon the Pallas-0.5-LASER-0.1 base. It is optimized for improving performance on specific tasks, as demonstrated by its test accuracy and logloss metrics on the BigBench causal judgment subset.
Loading preview...
Model Overview
Mihaiii/Pallas-0.5-LASER-0.2 is a 34 billion parameter language model that integrates a LASER (Low-rank Adaptation for Efficient Reasoning) intervention. This model is an iteration building upon Mihaiii/Pallas-0.5-LASER-0.1 and the base Pallas-0.5 model.
Key Characteristics
- LASER Intervention: Applies a rank-reduction technique to the attention layer weights (
self_attn.k_proj.weight,self_attn.q_proj.weight,self_attn.v_proj.weight,self_attn.o_proj.weight) at layer 58. - Performance Improvement: Demonstrates improved test accuracy and reduced logloss on the
bigbenchdataset, specifically thecausal_judgementsubset, compared to the basePallas-0.5model. - Context Length: Features a substantial context window of 32768 tokens.
Performance Metrics (on BigBench causal_judgement)
| Model | Validation acc | Validation logloss | Test acc | Test logloss |
|---|---|---|---|---|
| Pallas-0.5 | 55.263 | 1.650 | 60.526 | 1.463 |
| Pallas-0.5-LASER-0.1 | 55.263 | 1.639 | 61.184 | 1.451 |
| Pallas-0.5-LASER-0.2 | 55.263 | 1.646 | 61.184 | 1.458 |
Replication
Replication on a single A100 GPU is possible using a specific branch of the LASER codebase, which addresses out-of-memory issues for 34B models.