suyashdb/broken-model-fixed

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 1, 2026Architecture:Transformer Cold

suyashdb/broken-model-fixed is an 8 billion parameter causal language model based on the Qwen3 architecture, specifically Qwen/Qwen3-8B. This model was fixed to include the correct base model declaration, a functional chat template, and essential tokenizer files, resolving issues that prevented inference and standalone tokenizer loading. It is primarily designed for general text generation and chat applications, now fully functional with standard inference servers.

Loading preview...

Overview

suyashdb/broken-model-fixed is an 8 billion parameter model derived from Qwen/Qwen3-8B. This repository addresses critical configuration and file omissions that previously rendered the model unusable for inference and standalone tokenizer loading. The primary fixes include correcting the base_model declaration, adding a complete Jinja2 chat_template to tokenizer_config.json, and uploading missing tokenizer files (vocab.json, tokenizer.json, special_tokens_map.json).

Key Fixes & Capabilities

  • Corrected Base Model: The base_model was updated from meta-llama/Meta-Llama-3.1-8B to Qwen/Qwen3-8B, aligning with the actual architecture and configuration.
  • Functional Chat Template: The absence of a chat_template previously prevented OpenAI-compatible inference servers from processing /chat/completions requests. The added template supports system, user, and assistant messages, tool calls, and a thinking mode toggle.
  • Complete Tokenizer Files: Essential tokenizer files were uploaded, enabling the tokenizer to be loaded and used independently.

reasoning_effort Clarification

This model, like the base Qwen3-8B, does not natively support reasoning_effort parameters (e.g., "low", "high") as seen in OpenAI's o-series models. Qwen3-8B was not trained with budget-forcing, meaning it cannot dynamically adjust its reasoning depth based on such hints. It operates with a binary thinking/non-thinking mode. Implementing true reasoning_effort would require retraining with budget-forcing and specific inference server logic to interpret and apply these parameters.