KasparZ/mtext-20251122_qwen3-14b-base_merged_modified_special

TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kPublished:Jan 15, 2026Architecture:Transformer Cold

The KasparZ/mtext-20251122_qwen3-14b-base_merged_modified_special is a 14 billion parameter causal language model, fine-tuned using LoRA with specific modifications to its token embeddings and training procedure. It was trained on the KasparZ/mtext-111025 dataset, utilizing a context length of 32768 tokens. This model is designed for general causal language modeling tasks, with its unique training configuration potentially offering specialized performance characteristics.

Loading preview...

Model Overview

This model, KasparZ/mtext-20251122_qwen3-14b-base_merged_modified_special, is a 14 billion parameter causal language model. It has been fine-tuned using a LoRA configuration, targeting specific modules like q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, and down_proj, with additional modules embed_tokens and lm_head saved. The training process involved adding new special tokens (<|s|>, <|e|>) and resizing the token embeddings.

Training Details

  • Training Data: The model was trained on the KasparZ/mtext-111025 dataset.
  • LoRA Configuration: Key parameters include r=16, lora_alpha=32, lora_dropout=0.05, and use_rslora=True.
  • Hyperparameters: Training utilized a learning rate of 1e-4, num_train_epochs=2, gradient_accumulation_steps=8, and a warmup_ratio=0.03.
  • Context Length: The model supports a context length of 32768 tokens.

Potential Use Cases

While specific direct uses are not detailed, the model's causal language modeling objective and fine-tuning approach suggest applicability in:

  • Text generation and completion.
  • Tasks requiring understanding and generation within a large context window.
  • Further fine-tuning for specialized downstream NLP applications.