aitfSR4/ub-sr04-qwen3.5-4b-cpt2-sft-game
The aitfSR4/ub-sr04-qwen3.5-4b-cpt2-sft-game model is a 4.5 billion parameter Qwen3.5-based large language model developed by aitfSR4, specifically fine-tuned as a game content generator for the Sekolah Rakyat platform. Optimized for generating structured game content, this model operates as an OpenAI-compatible inference server using Unsloth. It supports a context length of 32768 tokens and is designed for efficient deployment on various GPUs, including A100, L4, and T4.
Loading preview...
Model Overview
The aitfSR4/ub-sr04-qwen3.5-4b-cpt2-sft-game is a 4.5 billion parameter Qwen3.5-based large language model (LLM) developed by aitfSR4. Its primary function is to generate game content specifically for the Sekolah Rakyat platform. This model is deployed as an OpenAI-compatible inference server, utilizing Unsloth for efficient operation, as the Qwen3 hybrid architecture is not compatible with vLLM.
Key Capabilities
- Game Content Generation: Specialized in producing structured game content, likely in JSON format, as indicated by the API examples.
- OpenAI-Compatible API: Provides a standard API interface for chat completions, making integration straightforward.
- Efficient Inference: Leverages Unsloth for optimized performance, supporting
bfloat16and4-bitquantization for various GPU VRAM configurations. - ChatML Template: Uses the ChatML format (
<|im_start|>/<|im_end|>) for chat interactions. - Direct JSON Output: Designed to output game JSON directly, without additional thinking blocks or markdown fences.
Deployment and Usage
This model is intended for deployment on platforms like RunPod, with specific hardware requirements ranging from A100 40GB (bfloat16) to T4 16GB (4-bit). It requires minimal 20GB storage and 16GB CPU RAM. The provided server.py script facilitates easy setup and execution of the inference server. The maximum sequence length supported is 4096 tokens.
Good for
- Developers building applications that require automated generation of game content for the Sekolah Rakyat platform.
- Projects needing an OpenAI-compatible LLM endpoint for structured text generation.
- Use cases where efficient deployment on GPUs with varying VRAM (e.g., L4, T4) is crucial.