KKHYA/qwen3-14b-fft-if
KKHYA/qwen3-14b-fft-if is a 14 billion parameter language model fine-tuned from Qwen/Qwen3-14B, featuring a 32768 token context length. This model has been specifically fine-tuned across multiple datasets including mft_tulu3_personas_if, mft_oasst1, mft_oasst2, mft_coconot, mft_aya, and mft_daring_anteater. It is designed for general language understanding and generation tasks, leveraging its diverse training data for broad applicability.
Loading preview...
Overview
KKHYA/qwen3-14b-fft-if is a 14 billion parameter large language model, building upon the robust Qwen3-14B architecture. It distinguishes itself through a comprehensive fine-tuning process across a diverse set of datasets, including mft_tulu3_personas_if, mft_oasst1, mft_oasst2, mft_coconot, mft_aya, and mft_daring_anteater. This extensive training aims to enhance its general conversational abilities and instruction following.
Key Capabilities
- General Language Understanding: Benefits from the Qwen3 base model's strong foundation in comprehending complex language patterns.
- Instruction Following: Improved through fine-tuning on instruction-based datasets like mft_oasst1 and mft_oasst2.
- Conversational Fluency: Enhanced by datasets focused on persona-based interactions and general dialogue.
- Broad Applicability: The diverse training data suggests suitability for a wide range of natural language processing tasks.
Training Details
The model was trained with a learning rate of 1e-05, a total batch size of 128, and utilized a cosine learning rate scheduler with a 0.1 warmup ratio over 2 epochs. This configuration, combined with an AdamW optimizer, aims for stable and effective learning across the varied fine-tuning datasets.
Good For
- General-purpose chatbots: Its fine-tuning on conversational and instruction datasets makes it suitable for interactive applications.
- Content generation: Can be used for generating text, summaries, or creative content based on prompts.
- Research and experimentation: Provides a solid base for further fine-tuning on more specific downstream tasks.