Disya/magnum-qwen3-4b
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jun 30, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

Disya/magnum-qwen3-4b is a 4 billion parameter language model developed by Disya, based on the Qwen3 architecture. This model is specifically fine-tuned using Supervised Fine-Tuning (SFT) methods, making it optimized for generating coherent and contextually relevant text based on provided instructions. With a substantial context length of 40960 tokens, it is well-suited for tasks requiring extensive contextual understanding and detailed responses.

Loading preview...

Model Overview: Disya/magnum-qwen3-4b

Disya/magnum-qwen3-4b is a 4 billion parameter language model built upon the Qwen3 architecture. This model has undergone Supervised Fine-Tuning (SFT), which means it has been trained on a dataset of input-output pairs to learn specific behaviors and response patterns. The SFT process aims to enhance the model's ability to follow instructions and generate high-quality, relevant text.

Key Capabilities

  • Instruction Following: Optimized through SFT to understand and execute a wide range of instructions, producing targeted outputs.
  • Extended Context Window: Features a significant context length of 40960 tokens, allowing it to process and generate text based on very long inputs, maintaining coherence over extended conversations or documents.
  • Text Generation: Capable of generating diverse forms of text, from creative content to informative summaries, guided by the SFT training.

Good For

  • Conversational AI: Its instruction-following capabilities and large context window make it suitable for chatbots and virtual assistants that require understanding long user queries and maintaining conversation history.
  • Content Creation: Can be used for generating articles, summaries, or creative writing pieces where specific prompts and detailed context are provided.
  • Long-form Question Answering: Excels in scenarios where answers need to be derived from extensive documents or complex information, leveraging its large context capacity.