ReWiz-7B: Enhanced Reasoning and De-censored Mistral Fine-tune
ReWiz-7B is a 7 billion parameter model developed by theprint, built upon the Mistral 7B Instruct (0.3) architecture. This fine-tuned model distinguishes itself through a dual-focus training approach, dedicating half of its training data to improving reasoning abilities and the other half to de-censoring its outputs.
Key Capabilities & Training
- Enhanced Reasoning: A significant portion of its training utilized datasets such as EvolKit-20k and reasoning-base-20k, specifically targeting the improvement of logical deduction and problem-solving skills.
- De-censored Outputs: Integration of the WizardLM dataset aims to provide more open and less restricted responses, offering a broader range of conversational and generative capabilities.
- Efficient Fine-tuning: The model was fine-tuned using Unsloth and Huggingface's TRL library, enabling faster training times.
Performance Insights
Evaluations on the Open LLM Leaderboard show an average score of 17.54. Specific metrics include 40.48 on IFEval (0-Shot) and 23.50 on BBH (3-Shot), indicating its performance in instruction following and multi-hop reasoning tasks. While its MATH Lvl 5 (4-Shot) score is 2.57, its primary strength lies in its targeted reasoning and de-censoring objectives rather than advanced mathematical proficiency.
Good For
- Applications requiring a 7B model with improved reasoning capabilities.
- Use cases where less restricted or de-censored model outputs are desired.
- Developers looking for a Mistral-based model with specific enhancements in logical processing and conversational freedom.