syj4205/broken-model-fixed
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 9, 2026Architecture:Transformer Cold
The syj4205/broken-model-fixed is an 8 billion parameter Qwen3-based causal language model with a 32K context length. This model is a corrected version of a previously non-functional Qwen3-8B, specifically engineered to enable proper API server functionality. It features critical fixes to its chat template and shard mapping, making it suitable for deployment in OpenAI-compatible API environments.
Loading preview...
syj4205/broken-model-fixed: A Repaired Qwen3-8B Model
This model, syj4205/broken-model-fixed, is a crucial repair of a previously non-functional Qwen3-8B model. The original model was unusable for /chat/completions API servers due to significant configuration errors. This fixed version addresses these issues, making the powerful Qwen3-8B architecture accessible for deployment.
Key Fixes Implemented
chat_templateAddition: The official Qwen3 Jinja2 chat template was added totokenizer_config.json. This is vital for OpenAI-compatible API servers (like vLLM or FriendliAI) to correctly format user messages into model input, preventing API failures.- Shard Mapping Correction: Errors in
model.safetensors.index.jsonwere resolved. Specifically,q_proj,k_proj, andv_projfor Layer 7 were incorrectly pointing to the wrong shard, leading to weight loading errors during inference. This has been corrected to ensure all tensors load from the proper shard. - Metadata Update: The
base_modelmetadata inREADME.mdwas corrected frommeta-llama/Meta-Llama-3.1-8BtoQwen/Qwen3-8B, accurately reflecting the model's true architecture and weights.
What Was Not Changed
config.json: The core architectural values remain consistent with the official Qwen3-8B specifications.tokenizer_class: The model intentionally reusesQwen2Tokenizeras Qwen3 shares the same Byte Pair Encoding (BPE) scheme.eos_token_id: The end-of-sequence token IDs[151645, 151643]are retained, matching official Qwen3 generation configurations.
Ideal Use Cases
- API Server Deployment: This model is specifically designed for developers looking to deploy a functional Qwen3-8B instance via OpenAI-compatible API servers.
- Research and Development: Provides a stable and correctly configured Qwen3-8B for experimentation and integration into larger systems.