KasparZ/mtext-20251122_qwen3-14b-base_merged
KasparZ/mtext-20251122_qwen3-14b-base_merged is a 14 billion parameter causal language model developed by KasparZ. This model is fine-tuned using LoRA with specific target modules and a custom training dataset, KasparZ/mtext-111025. It features a 32768 token context length and is optimized for causal language modeling tasks, making it suitable for applications requiring robust text generation and understanding.
Loading preview...
Model Overview
This model, KasparZ/mtext-20251122_qwen3-14b-base_merged, is a 14 billion parameter causal language model. It has been fine-tuned using a LoRA configuration, targeting specific modules such as q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, and down_proj, with embed_tokens and lm_head modules saved. The training process involved a custom dataset, KasparZ/mtext-111025, and specific preprocessing steps including adding new special tokens (<|s|>, <|e|>) and adjusting tokenizer padding.
Key Training Details
- LoRA Configuration:
r=16,lora_alpha=32,lora_dropout=0.05,use_rslora=True. - Hyperparameters:
per_device_train_batch_size=1,gradient_accumulation_steps=8,num_train_epochs=2,learning_rate=1e-4,weight_decay=0.01,max_grad_norm=0.5. - Context Length: The model supports a context length of 32768 tokens.
Potential Use Cases
While specific direct use cases are not detailed, the model's architecture and training suggest suitability for:
- Causal language modeling tasks.
- Applications benefiting from a 14B parameter model with a large context window.
- Further fine-tuning for specialized text generation or understanding tasks.