FlagRelease/Qwen3-8B-mthreads-FlagOS

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kArchitecture:Transformer0.0K Cold

FlagRelease/Qwen3-8B-mthreads-FlagOS is an 8 billion parameter Qwen3 model adapted for the Mthreads chip using the FlagOS software stack. Developed by FlagRelease, this model leverages the FlagScale distributed framework and FlagGems operator library for efficient deployment and inference on heterogeneous computing resources. It is specifically optimized for seamless integration and performance consistency on Mthreads hardware, providing out-of-the-box inference capabilities.

Loading preview...

What is FlagRelease/Qwen3-8B-mthreads-FlagOS?

This model is an 8 billion parameter variant of the Qwen3 large language model, specifically adapted and optimized for deployment on Mthreads chips using the FlagOS software stack. Developed by FlagRelease, it represents a unified approach to heterogeneous computing, enabling efficient and automated model migration across diverse hardware.

Key Capabilities & Features

  • Integrated Deployment: Deep integration with the open-source FlagScale framework provides out-of-the-box inference scripts and pre-configured hardware/software parameters. A ready-to-use FlagOS-mthreads container image allows for deployment within minutes.
  • Performance Consistency: Rigorous benchmark testing ensures that performance and results from the FlagOS stack are consistent with native stacks on public benchmarks.
  • FlagScale Framework: Utilizes FlagScale for distributed training and inference, offering a unified deployment interface, intelligent parallel optimization, and seamless operator switching.
  • FlagGems Operator Library: Leverages FlagGems, a Triton-based, cross-architecture operator library with over 100 operators, supporting 7 accelerator backends for high efficiency and performance.
  • FlagEval Evaluation: Performance is assessed using FlagEval (Libra), a comprehensive evaluation system that supports multi-dimensional assessments across various tasks and modalities.

Benchmark Results

Comparative evaluation against Qwen3-8B on H100-CUDA shows competitive performance on key metrics:

  • AIME_0fewshot: 0.800 (vs 0.700 on H100)
  • MMLU_5fewshot: 0.706 (vs 0.699 on H100)

Good For

  • Developers targeting Mthreads chip architectures who need an optimized Qwen3 model.
  • Users seeking a highly integrated and easily deployable LLM solution for specific hardware.
  • Environments requiring consistent performance across heterogeneous computing resources.