syj4205/broken-model-fixed

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 9, 2026Architecture:Transformer Cold

The syj4205/broken-model-fixed is an 8 billion parameter Qwen3-based causal language model with a 32K context length. This model is a corrected version of a previously non-functional Qwen3-8B, specifically engineered to enable proper API server functionality. It features critical fixes to its chat template and shard mapping, making it suitable for deployment in OpenAI-compatible API environments.

Loading preview...

syj4205/broken-model-fixed: A Repaired Qwen3-8B Model

This model, syj4205/broken-model-fixed, is a crucial repair of a previously non-functional Qwen3-8B model. The original model was unusable for /chat/completions API servers due to significant configuration errors. This fixed version addresses these issues, making the powerful Qwen3-8B architecture accessible for deployment.

Key Fixes Implemented

  • chat_template Addition: The official Qwen3 Jinja2 chat template was added to tokenizer_config.json. This is vital for OpenAI-compatible API servers (like vLLM or FriendliAI) to correctly format user messages into model input, preventing API failures.
  • Shard Mapping Correction: Errors in model.safetensors.index.json were resolved. Specifically, q_proj, k_proj, and v_proj for Layer 7 were incorrectly pointing to the wrong shard, leading to weight loading errors during inference. This has been corrected to ensure all tensors load from the proper shard.
  • Metadata Update: The base_model metadata in README.md was corrected from meta-llama/Meta-Llama-3.1-8B to Qwen/Qwen3-8B, accurately reflecting the model's true architecture and weights.

What Was Not Changed

  • config.json: The core architectural values remain consistent with the official Qwen3-8B specifications.
  • tokenizer_class: The model intentionally reuses Qwen2Tokenizer as Qwen3 shares the same Byte Pair Encoding (BPE) scheme.
  • eos_token_id: The end-of-sequence token IDs [151645, 151643] are retained, matching official Qwen3 generation configurations.

Ideal Use Cases

  • API Server Deployment: This model is specifically designed for developers looking to deploy a functional Qwen3-8B instance via OpenAI-compatible API servers.
  • Research and Development: Provides a stable and correctly configured Qwen3-8B for experimentation and integration into larger systems.