ParrotRouter/Qwen3-4B-Instruct-2507-20250808-233922-0 is a 4 billion parameter Qwen3-4B based model, created by ParrotRouter through layer-wise merging of multiple fine-tuned variants. Optimized specifically for graduate-level physics question answering (GPQA Diamond), it achieves a 45.45% score on this benchmark and 72.51% on MMLU. This model is designed to combine strengths from various source models for specialized performance.
Loading preview...
Model Overview
ParrotRouter/Qwen3-4B-Instruct-2507-20250808-233922-0 is a 4 billion parameter model built upon the Qwen3-4B architecture. It was developed by ParrotRouter using a unique layer-wise merging technique, combining layers from various fine-tuned Qwen3-4B variants to achieve optimized performance on specific tasks.
Key Capabilities & Performance
This model demonstrates strong performance on academic benchmarks:
- GPQA Diamond (0-shot): Achieves a score of 45.45% on graduate-level physics question answering, indicating its specialization in complex scientific reasoning.
- MMLU (5-shot): Scores 72.51% across 57 subjects, showcasing its broad language understanding capabilities.
Unique Approach
What sets this model apart is its layer-wise merging process. Instead of traditional fine-tuning, individual transformer layers (0-35) are selected from different source models, and then combined. This allows for the integration of specialized knowledge and capabilities from multiple fine-tuned models into a single, optimized model. The non-layer weights (embeddings and final layers) are derived from the base Qwen3-4B model.
Intended Use Cases
This model is particularly well-suited for:
- Graduate-level physics Q&A: Its primary optimization target makes it effective for complex scientific inquiries.
- Research and experimentation: Ideal for exploring the effectiveness of layer-wise merging techniques and specialized model development.
Limitations
As an experimental merge, its performance may vary on tasks outside its specific optimization targets. Users should validate its suitability for their particular use case before deployment.