razy101/Qwen3-1.7B-GPT-5.4-Distill

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 16, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The razy101/Qwen3-1.7B-GPT-5.4-Distill is a 2 billion parameter Qwen3-based language model developed by razy101, fine-tuned from unsloth/Qwen3-1.7B-unsloth-bnb-4bit. This model was trained using Unsloth and Huggingface's TRL library, achieving 2x faster training speeds. With a 32768 token context length, it is optimized for efficient deployment and tasks requiring a balance of performance and resource utilization.

Loading preview...

Overview

The razy101/Qwen3-1.7B-GPT-5.4-Distill is a 2 billion parameter language model based on the Qwen3 architecture, developed by razy101. It was fine-tuned from the unsloth/Qwen3-1.7B-unsloth-bnb-4bit model, leveraging Unsloth and Huggingface's TRL library for accelerated training. This approach enabled the model to be trained 2x faster, making it an efficient option for various NLP tasks.

Key Characteristics

  • Architecture: Qwen3-based, a robust and capable foundation model.
  • Parameter Count: 2 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a substantial context window of 32768 tokens, suitable for processing longer inputs and generating coherent, extended outputs.
  • Training Efficiency: Utilizes Unsloth for significantly faster fine-tuning, reducing development time and resource consumption.
  • License: Distributed under the Apache-2.0 license, allowing for broad use and modification.

Use Cases

This model is well-suited for applications where a compact yet capable language model is required. Its efficient training and moderate size make it ideal for:

  • Resource-constrained environments: Deployments on devices or platforms with limited computational power.
  • Rapid prototyping and experimentation: Quick iteration cycles due to faster fine-tuning.
  • General text generation and understanding tasks: Summarization, question answering, content creation, and more, where the 2B parameter count provides sufficient capability.
  • Applications requiring a long context window: Tasks that benefit from processing extensive input texts, such as document analysis or complex conversational agents.