Jaume-inLab/vocabulary_sliced_CA-ES-EN-qwen3-14B

TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kPublished:Feb 20, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The Jaume-inLab/vocabulary_sliced_CA-ES-EN-qwen3-14B is a 14 billion parameter Qwen3-based causal language model with a context length of 32768 tokens. Developed by Jaume-inLab, this model features a pruned vocabulary specifically optimized for Catalan, Spanish, and English. Its primary differentiator is a reduced memory footprint, offering significant VRAM savings during training and inference, particularly for tasks like DPO with long sequences.

Loading preview...

Overview

This model, Jaume-inLab/vocabulary_sliced_CA-ES-EN-qwen3-14B, is a specialized version of the original Qwen/Qwen3-14B model. Its core innovation lies in a vocabulary pruning strategy designed to enhance efficiency for multilingual applications focusing on Catalan, Spanish, and English.

Key Features & Optimizations

  • Reduced VRAM Footprint: The vocabulary has been significantly reduced from 152,064 to approximately 134,943 tokens. This leads to substantial memory savings, with theoretical estimates of ~1.06 GB and observed real-world savings of ~1.90 GB during forward passes with long sequences (e.g., 8192 tokens).
  • Multilingual Focus: The pruning process specifically retained tokens relevant to Catalan, Spanish, and English, along with essential special tokens and base bytes, making it highly efficient for these languages.
  • LoRA Compatibility: The model maintains 100% compatibility with LoRA adapters trained on the original Qwen/Qwen3-14B, provided these adapters target only the Attention and MLP layers. This ensures seamless integration without performance degradation.

Use Cases & Benefits

This model is particularly beneficial for:

  • Memory-Constrained Environments: Ideal for training and inference where VRAM is a critical factor.
  • Long Sequence Processing: Optimized for tasks requiring long context lengths, such as Direct Preference Optimization (DPO), due to reduced logits tensor size.
  • Multilingual Applications: Provides an efficient base for applications primarily involving Catalan, Spanish, and English text generation and understanding.