JoanneJegou/Qwen_SFT_post_trained_v1
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 24, 2026Architecture:Transformer Cold
JoanneJegou/Qwen_SFT_post_trained_v1 is a 2 billion parameter language model based on the Qwen3-1.7B architecture, fine-tuned with a context length of 32768 tokens. This model has undergone Supervised Fine-Tuning (SFT) using the microsoft/wikiQA and MuskumPillerum/General-Knowledge datasets, enhanced with LoRA. It is specialized for question answering and general knowledge tasks, leveraging its fine-tuned knowledge base.
Loading preview...
Model Overview
JoanneJegou/Qwen_SFT_post_trained_v1 is a 2 billion parameter language model built upon the Qwen3-1.7B base architecture. It features a substantial context window of 32768 tokens, allowing it to process and generate longer sequences of text.
Key Capabilities
- Supervised Fine-Tuning (SFT): The model has been specifically fine-tuned using Supervised Fine-Tuning (SFT) techniques.
- Knowledge-Enhanced Training: Training incorporated two distinct datasets:
microsoft/wikiQAandMuskumPillerum/General-Knowledge. This combination suggests an optimization for factual recall and question-answering tasks. - LoRA Integration: The fine-tuning process utilized LoRA (Low-Rank Adaptation) for efficient and effective adaptation of the base model.
Good For
- Question Answering: Its training on the
wikiQAdataset indicates a strong suitability for answering factual questions. - General Knowledge Tasks: The inclusion of the
MuskumPillerum/General-Knowledgedataset positions this model well for applications requiring broad factual understanding and retrieval. - Applications requiring a large context window: The 32768 token context length is beneficial for processing detailed queries or generating comprehensive responses.