ParrotRouter/Qwen3-4B-Instruct-2507-20250808-233922-0
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Warm

ParrotRouter/Qwen3-4B-Instruct-2507-20250808-233922-0 is a 4 billion parameter Qwen3-4B based model, created by ParrotRouter through layer-wise merging of multiple fine-tuned variants. Optimized specifically for graduate-level physics question answering (GPQA Diamond), it achieves a 45.45% score on this benchmark and 72.51% on MMLU. This model is designed to combine strengths from various source models for specialized performance.

Loading preview...

Model Overview

ParrotRouter/Qwen3-4B-Instruct-2507-20250808-233922-0 is a 4 billion parameter model built upon the Qwen3-4B architecture. It was developed by ParrotRouter using a unique layer-wise merging technique, combining layers from various fine-tuned Qwen3-4B variants to achieve optimized performance on specific tasks.

Key Capabilities & Performance

This model demonstrates strong performance on academic benchmarks:

  • GPQA Diamond (0-shot): Achieves a score of 45.45% on graduate-level physics question answering, indicating its specialization in complex scientific reasoning.
  • MMLU (5-shot): Scores 72.51% across 57 subjects, showcasing its broad language understanding capabilities.

Unique Approach

What sets this model apart is its layer-wise merging process. Instead of traditional fine-tuning, individual transformer layers (0-35) are selected from different source models, and then combined. This allows for the integration of specialized knowledge and capabilities from multiple fine-tuned models into a single, optimized model. The non-layer weights (embeddings and final layers) are derived from the base Qwen3-4B model.

Intended Use Cases

This model is particularly well-suited for:

  • Graduate-level physics Q&A: Its primary optimization target makes it effective for complex scientific inquiries.
  • Research and experimentation: Ideal for exploring the effectiveness of layer-wise merging techniques and specialized model development.

Limitations

As an experimental merge, its performance may vary on tasks outside its specific optimization targets. Users should validate its suitability for their particular use case before deployment.