Jackrong/GPT-5-Distill-Qwen3-4B-Instruct
Jackrong/GPT-5-Distill-Qwen3-4B-Instruct is a 4 billion parameter instruction-tuned conversational language model, based on Qwen3-4B-Instruct. It is fine-tuned using Supervised Fine-Tuning on ShareGPT data and knowledge distilled from GPT-5 responses, aiming to replicate GPT-5's conversational style. Supporting both English and Chinese with a 32K token context length, this model excels at natural-sounding dialogues and general knowledge tasks with low computational overhead.
Loading preview...
Model Overview
This model, Jackrong/GPT-5-Distill-Qwen3-4B-Instruct, is a 4 billion parameter instruction-tuned conversational LLM built upon the Qwen/Qwen3-4B-Instruct-2507 base model. Its unique characteristic lies in its training methodology: it undergoes Supervised Fine-Tuning (SFT) on ShareGPT data and incorporates knowledge distillation from LMSYS GPT-5 responses. This process aims to imbue the model with the conversational style and quality of GPT-5, offering high-quality, natural-sounding dialogues while maintaining low computational overhead.
Key Capabilities
- Lightweight and Efficient: Approximately 4 billion parameters, ensuring fast inference and reduced resource consumption.
- GPT-5 Distillation-Style Responses: Designed to mimic the conversational fluency and helpfulness observed in GPT-5.
- Highly Conversational: Optimized for engaging chatbot-style interactions and rich dialogue flows.
- Multilingual Support: Seamlessly handles both Chinese and English inputs and outputs.
- Extended Context Window: Supports a maximum context length of 32,768 tokens.
Recommended Use Cases
- Casual chat in Chinese and English.
- General knowledge explanations and reasoning guidance.
- Code suggestions and basic debugging assistance.
- Writing assistance, including editing, summarizing, and rewriting tasks.
- Role-playing conversations with appropriately designed prompts.