Steven10429/qwen14-2wc1p-eos-3-merge

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Feb 15, 2025License:otherArchitecture:Transformer Cold

Steven10429/qwen14-2wc1p-eos-3-merge is a 14.8 billion parameter language model developed by Steven10429, built upon a Qwen1.5 base architecture. This model features a substantial 131,072 token context length and incorporates specific training iterations focused on improving generation length and controlling the End-Of-Sequence (EOS) token behavior. It is designed for applications requiring robust language understanding and generation with an emphasis on controlled output length.

Loading preview...

Model Overview

Steven10429/qwen14-2wc1p-eos-3-merge is a 14.8 billion parameter language model developed by Steven10429. It is a merge of a base model, Steven10429/qwen14b-2wc1p-pj3ha_qwen14b-generic-eos-2, and a LoRA model, Steven10429/qwen14-2wc1p-eos-3. This model supports various quantization methods including Q2_K, Q4_K, IQ4_NL, Q5_K_M, Q6_K, and Q8_0, making it adaptable for different deployment scenarios.

Key Training Iterations

This specific iteration of the model focused on several key improvements:

  • EOS Activation: The End-Of-Sequence (EOS) token was explicitly enabled during training.
  • Learning Rate Adjustment: The learning rate was reduced to refine the training process.
  • Epochs: Training was conducted over 3 epochs.
  • Generation Length Improvement: A primary goal of this iteration was to address and improve the model's ability to generate longer, more coherent outputs.

Future Development Focus

Future development plans include:

  • Randomized EOS: Exploring a 0.3 probability for random EOS token appearance.
  • Reduced EOS Probability: Further reducing the probability of EOS to fine-tune output control.

Use Cases

This model is suitable for applications where controlled generation length and specific EOS behavior are important. Its 14.8B parameters and 131,072 token context window make it capable of handling complex language tasks.