Model Overview
Abdullah-Taha/UTN-Qwen3-0.6B-LoRA-merged is a specialized language model built upon the Qwen3-0.6B base architecture. It has been fine-tuned using the LoRA (Low-Rank Adaptation) method with specific parameters (r=64, alpha=128) and subsequently merged, allowing for direct inference without the need for PEFT libraries. This model is particularly tailored to the domain of the University of Technology Nuremberg (UTN).
Key Capabilities
- Domain-Specific Expertise: Fine-tuned on 1,289 UTN Q&A pairs, enabling it to provide relevant and accurate information regarding the University of Technology Nuremberg.
- Efficient Inference: The LoRA weights are merged into the base model, simplifying deployment and allowing for direct use with standard Hugging Face Transformers pipelines.
- Compact Size: With 0.8 billion parameters, it offers a balance between performance and computational efficiency, suitable for applications where resource constraints are a consideration.
Training Details
The model was trained for 5 epochs with a learning rate of 3e-4 on an NVIDIA A40 GPU. Evaluation on a validation set of 17 examples showed ROUGE-1 score of 0.5924, ROUGE-2 of 0.4967, and ROUGE-L of 0.5687, indicating its ability to generate relevant and coherent responses within its specialized domain.
Good For
- UTN-specific Q&A systems: Ideal for chatbots or virtual assistants designed to answer questions about the University of Technology Nuremberg.
- Information retrieval: Can be used to extract or summarize information from UTN-related texts.
- Specialized applications: Suitable for scenarios requiring a compact, domain-adapted language model focused on university-specific content.