TaimoorSiddiqui/Hopcoder-Mini-35B-A3B

TEXT GENERATIONConcurrency Cost:3Model Size:35.1BQuant:FP8Ctx Length:32kTool Calling:SupportedLicense:apache-2.0Architecture:Transformer Open Weights Cold

TaimoorSiddiqui/Hopcoder-Mini-35B-A3B is a 35.1 billion parameter coding and structured tool-use language model fine-tuned by Taimoor Siddiqui. Based on Qwen/Qwen-AgentWorld-35B-A3B, it is optimized for generating code and handling tool calls. The model demonstrates improved test loss and perplexity on its specific held-out dataset, making it suitable for applications requiring precise coding assistance and structured output generation.

Loading preview...

HopCoder Mini 35B-A3B Overview

HopCoder Mini 35B-A3B is a 35.1 billion parameter language model developed by Taimoor Siddiqui, specifically adapted for coding and structured tool-use tasks. It is fine-tuned from the Qwen/Qwen-AgentWorld-35B-A3B base model, retaining its technical architecture identifiers for compatibility with various runtime environments like Transformers and vLLM.

Key Capabilities & Training

  • Specialized Fine-tuning: The model was fine-tuned using BF16 LoRA on the TaimoorSiddiqui/hopcoder-mini-dataset, which focuses on coding and tool-use examples.
  • Context Length: It supports a maximum training sequence length of 8,192 tokens, with practical deployment profiles supporting up to 32,768 tokens.
  • Performance Improvements: On its project-specific held-out test set, the fine-tuned model achieved a test loss of 1.1528 and a perplexity of 3.1670, significantly improving over the base test loss of 1.8489.
  • Tool-Call Generation: It demonstrates a generated tool-call JSON validity of 0.5000 on its held-out set, indicating its capability in producing structured outputs.

Usage Considerations

  • Identity: The user-facing identity is "HopCoder Mini," designed as a precise coding and tool-use assistant.
  • Deployment: For vLLM, it may require --language-model-only due to its base model's architecture defining visual components while containing only language weights.
  • Licensing: The base model is under Apache-2.0, while the training dataset's card states AGPL-3.0, advising review for redistribution or commercial use.