Thrillcrazyer/Qwen-7B_SFT

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Nov 30, 2025Architecture:Transformer Cold

Thrillcrazyer/Qwen-7B_SFT is a 7.6 billion parameter language model, fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B using Supervised Fine-Tuning (SFT) with a context length of 32768 tokens. This model is designed for general text generation tasks, leveraging its SFT training to produce coherent and contextually relevant responses. It is suitable for applications requiring robust language understanding and generation capabilities based on its Qwen architecture.

Loading preview...

Thrillcrazyer/Qwen-7B_SFT Overview

This model is a 7.6 billion parameter language model, derived from the deepseek-ai/DeepSeek-R1-Distill-Qwen-7B base model. It has undergone Supervised Fine-Tuning (SFT) using the TRL library, which is a common method for enhancing a model's ability to follow instructions and generate high-quality text.

Key Capabilities

  • General Text Generation: Capable of generating human-like text based on given prompts.
  • Instruction Following: Improved ability to understand and respond to user instructions due to SFT.
  • Context Handling: Supports a substantial context length of 32768 tokens, allowing for processing and generating longer sequences of text.

Training Details

The model was trained using SFT, leveraging the TRL framework (version 0.25.1) along with Transformers (4.57.3), Pytorch (2.8.0), Datasets (4.4.1), and Tokenizers (0.22.1). Further details on the training process can be visualized via its Weights & Biases run.

Good For

  • Applications requiring a fine-tuned Qwen-based model for various text generation tasks.
  • Developers looking for a model with a strong foundation and SFT enhancements for improved conversational or instructional performance.
  • Use cases benefiting from a 32K context window for processing longer inputs or generating more extensive outputs.