welyty/qwen3-4b-alpaca-chatwithme

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 10, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

welyty/qwen3-4b-alpaca-chatwithme is a 4 billion parameter Qwen3-4B model fine-tuned by welyty using LoRA on the Alpaca dataset. This model is optimized for instruction-following conversations, demonstrating a final training loss of 1.0875 and a perplexity of approximately 3.00. It is designed for conversational AI applications requiring a compact yet capable instruction-tuned language model with a 32768 token context length.

Loading preview...

Model Overview

welyty/qwen3-4b-alpaca-chatwithme is a 4 billion parameter language model built upon the Qwen/Qwen3-4B base architecture. It has been fine-tuned by welyty using the LoRA (Low-Rank Adaptation) method on the yahma/alpaca-cleaned dataset, which comprises approximately 52,000 instruction-following examples. This fine-tuning process, conducted over one epoch with 4-bit quantization and bf16 precision, aimed to enhance the model's conversational capabilities and instruction-following accuracy.

Key Characteristics

  • Base Model: Qwen/Qwen3-4B
  • Fine-tuning: LoRA (r=8, alpha=16) on Alpaca dataset
  • Context Length: 32768 tokens
  • Training Metrics: Final training loss of 1.0875, validation loss of 1.0976, and an estimated perplexity of ~3.00.
  • Chat Format: Utilizes the ChatML format for structured conversations.
  • Efficiency: LoRA fine-tuning involved only about 13 million trainable parameters, representing 0.3% of the total model parameters, making the adaptation efficient.

Good For

  • Instruction-Following: Excels at understanding and responding to user instructions in a conversational context.
  • Conversational AI: Suitable for chatbots, virtual assistants, and interactive applications where clear and coherent dialogue is essential.
  • Resource-Efficient Deployment: Its 4 billion parameter size, combined with efficient LoRA fine-tuning, makes it a viable option for scenarios requiring a balance between performance and computational resources.