rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b
The rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b model, developed by rinna, is a 32 billion parameter DeepSeek-R1 distilled variant of the Qwen2.5 Bakeneko architecture. Fine-tuned with Chat Vector and Odds Ratio Preference Optimization (ORPO), this model is specifically designed to deliver superior performance in Japanese language tasks. It adheres to the DeepSeek-R1 chat format, making it optimized for Japanese-centric reasoning and conversational applications.
Loading preview...
Overview
This model, developed by rinna, is a 32 billion parameter variant based on the Qwen2.5 Bakeneko architecture. It is a DeepSeek-R1 distilled model, specifically fine-tuned for enhanced performance in Japanese language tasks.
Key Capabilities & Training
- DeepSeek-R1 Distillation: The model integrates instruction-following capabilities through a Chat Vector process, derived from DeepSeek-R1-Distill-Qwen-32B and Qwen2.5-32B.
- ORPO Fine-tuning: Further refined using Odds Ratio Preference Optimization (ORPO) on 1.2k curated data samples generated by DeepSeek-R1.
- Japanese Language Optimization: Designed to excel in Japanese language processing, adhering to the DeepSeek-R1 chat format.
- Architecture: A 64-layer, 5120-hidden-size transformer-based language model, inheriting the Qwen2.5 architecture.
Benchmarking Highlights
- Achieves 77.43 on Japanese LM Evaluation Harness.
- Scores 8.58 on Japanese MT-Bench (first turn) and 8.19 on Japanese MT-Bench (multi-turn), demonstrating strong conversational abilities in Japanese.
Good For
- Applications requiring high-performance Japanese language understanding and generation.
- Use cases benefiting from a model fine-tuned with DeepSeek-R1's reasoning capabilities.
- Developers seeking a 32B parameter model optimized for Japanese conversational AI and instruction following.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.