Overview
This model, Jaume-inLab/vocabulary_sliced_CA-ES-EN-qwen3-14B, is a specialized version of the original Qwen/Qwen3-14B model. Its core innovation lies in a vocabulary pruning strategy designed to enhance efficiency for multilingual applications focusing on Catalan, Spanish, and English.
Key Features & Optimizations
- Reduced VRAM Footprint: The vocabulary has been significantly reduced from 152,064 to approximately 134,943 tokens. This leads to substantial memory savings, with theoretical estimates of ~1.06 GB and observed real-world savings of ~1.90 GB during forward passes with long sequences (e.g., 8192 tokens).
- Multilingual Focus: The pruning process specifically retained tokens relevant to Catalan, Spanish, and English, along with essential special tokens and base bytes, making it highly efficient for these languages.
- LoRA Compatibility: The model maintains 100% compatibility with LoRA adapters trained on the original
Qwen/Qwen3-14B, provided these adapters target only the Attention and MLP layers. This ensures seamless integration without performance degradation.
Use Cases & Benefits
This model is particularly beneficial for:
- Memory-Constrained Environments: Ideal for training and inference where VRAM is a critical factor.
- Long Sequence Processing: Optimized for tasks requiring long context lengths, such as Direct Preference Optimization (DPO), due to reduced logits tensor size.
- Multilingual Applications: Provides an efficient base for applications primarily involving Catalan, Spanish, and English text generation and understanding.