Name: Jaume-inLab/vocabulary_sliced_CA-ES-EN-qwen3-14B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Jaume-inLab

Overview

This model, Jaume-inLab/vocabulary_sliced_CA-ES-EN-qwen3-14B, is a specialized version of the original Qwen/Qwen3-14B model. Its core innovation lies in a vocabulary pruning strategy designed to enhance efficiency for multilingual applications focusing on Catalan, Spanish, and English.

Key Features & Optimizations

Reduced VRAM Footprint: The vocabulary has been significantly reduced from 152,064 to approximately 134,943 tokens. This leads to substantial memory savings, with theoretical estimates of ~1.06 GB and observed real-world savings of ~1.90 GB during forward passes with long sequences (e.g., 8192 tokens).
Multilingual Focus: The pruning process specifically retained tokens relevant to Catalan, Spanish, and English, along with essential special tokens and base bytes, making it highly efficient for these languages.
LoRA Compatibility: The model maintains 100% compatibility with LoRA adapters trained on the original Qwen/Qwen3-14B, provided these adapters target only the Attention and MLP layers. This ensures seamless integration without performance degradation.

Use Cases & Benefits

This model is particularly beneficial for:

Memory-Constrained Environments: Ideal for training and inference where VRAM is a critical factor.
Long Sequence Processing: Optimized for tasks requiring long context lengths, such as Direct Preference Optimization (DPO), due to reduced logits tensor size.
Multilingual Applications: Provides an efficient base for applications primarily involving Catalan, Spanish, and English text generation and understanding.

Overview

Overview

Key Features & Optimizations

Use Cases & Benefits

Full Model Card (README)