adamo1139/LWM-7B-1M-1000000ctx-AEZAKMI-3_1-1702
The adamo1139/LWM-7B-1M-1000000ctx-AEZAKMI-3_1-1702 is a 7 billion parameter LargeWorldModel fine-tuned on the AEZAKMI v3.1 dataset. This model was optimized for long context capabilities, specifically trained with a maximum sequence length of 4000 tokens using QLoRA. It is designed to retain extensive contextual understanding, making it suitable for tasks requiring deep comprehension over long passages.
Loading preview...
Model Overview
The adamo1139/LWM-7B-1M-1000000ctx-AEZAKMI-3_1-1702 is a 7 billion parameter LargeWorldModel (LWM) that has been fine-tuned to excel in long context understanding. This model leverages the AEZAKMI v3.1 dataset and was trained using QLoRA with specific parameters (lora_r 32, cosine learning rate decay from 0.00015) over several epochs.
Key Capabilities
- Extended Context Window: Fine-tuned with a maximum sequence length of 4000 tokens, indicating strong capabilities for processing and understanding long inputs.
- Efficient Fine-tuning: The model was fine-tuned using unsloth and FA2 on an RTX 3090 Ti, completing training in approximately 6 hours, suggesting an efficient training methodology.
- Long Context Retention: The fine-tuning process was specifically aimed at preserving and enhancing the model's ability to handle and reason over extensive contextual information.
Good For
- Applications requiring deep understanding of long documents or conversations.
- Tasks where maintaining context over many turns or paragraphs is crucial.
- Use cases benefiting from a model optimized for long-range dependencies.