KasparZ/mtext-20251122_qwen3-14b-base_merged

TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kPublished:Nov 22, 2025Architecture:Transformer Cold

KasparZ/mtext-20251122_qwen3-14b-base_merged is a 14 billion parameter causal language model developed by KasparZ. This model is fine-tuned using LoRA with specific target modules and a custom training dataset, KasparZ/mtext-111025. It features a 32768 token context length and is optimized for causal language modeling tasks, making it suitable for applications requiring robust text generation and understanding.

Loading preview...

Model Overview

This model, KasparZ/mtext-20251122_qwen3-14b-base_merged, is a 14 billion parameter causal language model. It has been fine-tuned using a LoRA configuration, targeting specific modules such as q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, and down_proj, with embed_tokens and lm_head modules saved. The training process involved a custom dataset, KasparZ/mtext-111025, and specific preprocessing steps including adding new special tokens (<|s|>, <|e|>) and adjusting tokenizer padding.

Key Training Details

  • LoRA Configuration: r=16, lora_alpha=32, lora_dropout=0.05, use_rslora=True.
  • Hyperparameters: per_device_train_batch_size=1, gradient_accumulation_steps=8, num_train_epochs=2, learning_rate=1e-4, weight_decay=0.01, max_grad_norm=0.5.
  • Context Length: The model supports a context length of 32768 tokens.

Potential Use Cases

While specific direct use cases are not detailed, the model's architecture and training suggest suitability for:

  • Causal language modeling tasks.
  • Applications benefiting from a 14B parameter model with a large context window.
  • Further fine-tuning for specialized text generation or understanding tasks.