JoanneJegou/Qwen_SFT_post_trained_v1

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 24, 2026Architecture:Transformer Cold

JoanneJegou/Qwen_SFT_post_trained_v1 is a 2 billion parameter language model based on the Qwen3-1.7B architecture, fine-tuned with a context length of 32768 tokens. This model has undergone Supervised Fine-Tuning (SFT) using the microsoft/wikiQA and MuskumPillerum/General-Knowledge datasets, enhanced with LoRA. It is specialized for question answering and general knowledge tasks, leveraging its fine-tuned knowledge base.

Loading preview...

Model Overview

JoanneJegou/Qwen_SFT_post_trained_v1 is a 2 billion parameter language model built upon the Qwen3-1.7B base architecture. It features a substantial context window of 32768 tokens, allowing it to process and generate longer sequences of text.

Key Capabilities

  • Supervised Fine-Tuning (SFT): The model has been specifically fine-tuned using Supervised Fine-Tuning (SFT) techniques.
  • Knowledge-Enhanced Training: Training incorporated two distinct datasets: microsoft/wikiQA and MuskumPillerum/General-Knowledge. This combination suggests an optimization for factual recall and question-answering tasks.
  • LoRA Integration: The fine-tuning process utilized LoRA (Low-Rank Adaptation) for efficient and effective adaptation of the base model.

Good For

  • Question Answering: Its training on the wikiQA dataset indicates a strong suitability for answering factual questions.
  • General Knowledge Tasks: The inclusion of the MuskumPillerum/General-Knowledge dataset positions this model well for applications requiring broad factual understanding and retrieval.
  • Applications requiring a large context window: The 32768 token context length is beneficial for processing detailed queries or generating comprehensive responses.