Overview

This model implements a diverse beam search strategy, a variant of the standard beam search algorithm, to generate more varied and less repetitive text outputs. It is built upon the Qwen/Qwen3-0.6B base model, a 0.8 billion parameter decoder-only transformer. The core idea is to divide the total number of beams into groups and apply a diversity penalty to discourage similar sequences within these groups, thereby increasing the overall diversity of the generated candidates.

Key Capabilities

Enhanced Output Diversity: Produces a broader range of distinct text sequences compared to traditional beam search.
Configurable Parameters: Allows fine-tuning of generation behavior through num_beams, num_beam_groups, and diversity_penalty.
Compatibility: Works with decoder-only transformer models, matching the group_beam_search functionality found in transformers<4.56.0.
Post-processing: Integrates DoLa contrastive scoring for post-processing logits before token selection, further refining output quality.

Good For

Applications requiring creative text generation where diverse outputs are preferred.
Scenarios where avoiding repetitive or generic responses is crucial.
Exploring a wider array of potential continuations in sequence generation tasks.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)