sulthankris/WAIANG-Qwen3-4B
The sulthankris/WAIANG-Qwen3-4B is a 4 billion parameter Qwen3 model, developed by sulthankris, fine-tuned using Unsloth and Huggingface's TRL library. It was trained on a synthetic dataset, emphasizing efficient training. This model is suitable for applications requiring a compact yet capable language model with a 40960 token context length.
Loading preview...
Model Overview
The sulthankris/WAIANG-Qwen3-4B is a 4 billion parameter language model based on the Qwen3 architecture, developed by sulthankris. It was fine-tuned from unsloth/qwen3-4b-bnb-4bit using the Unsloth library, which enabled 2x faster training, and Huggingface's TRL library.
Key Characteristics
- Architecture: Qwen3-4B
- Parameter Count: 4 billion
- Context Length: 40960 tokens
- Training: Fine-tuned on a 21.5 MB synthetic dataset, primarily generated from
tngtech/DeepSeek-TNG-R1T2-Chimera. - Efficiency: Leverages Unsloth for accelerated training.
Potential Use Cases
This model is well-suited for applications where a compact and efficiently trained language model is beneficial. Its fine-tuning on a synthetic dataset suggests potential for tasks aligned with the data's characteristics, offering a balance between performance and resource efficiency.