xzybit/qwen2-7b-ts2 is a 7.6 billion parameter Qwen2-7B variant, fine-tuned using the TS^2 objective. This method incorporates entropy-aware adaptive weighting to enhance alignment stability and mitigate token-level probability collapse during training. It aims to improve both accuracy and diversity in LLM fine-tuning, addressing issues like degeneration of token diversity and inference-time mode collapse.
Loading preview...
Qwen2-7B-TS2: Enhanced Fine-Tuning for Stability and Diversity
xzybit/qwen2-7b-ts2 is a 7.6 billion parameter model based on Qwen2-7B, distinguished by its novel fine-tuning approach called TS^2 (Training with Sparsemax+, Testing with Softmax). This method introduces an entropy-aware adaptive weighting mechanism into the training objective, dynamically adjusting emphasis based on predictive entropy.
Key Capabilities and Innovations
- Improved Alignment Stability: The TS^2 objective is specifically designed to enhance the stability of model alignment during supervised fine-tuning.
- Mitigated Probability Collapse: It actively works to prevent token-level probability collapse, a common issue in standard likelihood maximization.
- Enhanced Accuracy and Diversity: By addressing overconfident likelihood-based training, the model aims to improve both the accuracy of predictions and the diversity of generated tokens.
- Adaptive Weighting: Unlike uniform likelihood maximization, TS^2 uses an adaptive weighting mechanism that responds to predictive entropy, preventing issues like inference-time mode collapse and reduced generalization.
Why Choose This Model?
This model is particularly suited for use cases where maintaining token diversity, preventing mode collapse, and ensuring robust generalization are critical. Its unique training methodology offers a potential advantage over conventionally fine-tuned models by providing a more stable and diverse output, as detailed in the associated research paper ICLR 2026.