Name: rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: rinna

Overview

This model, developed by rinna, is a 32 billion parameter variant based on the Qwen2.5 Bakeneko architecture. It is a DeepSeek-R1 distilled model, specifically fine-tuned for enhanced performance in Japanese language tasks.

Key Capabilities & Training

DeepSeek-R1 Distillation: The model integrates instruction-following capabilities through a Chat Vector process, derived from DeepSeek-R1-Distill-Qwen-32B and Qwen2.5-32B.
ORPO Fine-tuning: Further refined using Odds Ratio Preference Optimization (ORPO) on 1.2k curated data samples generated by DeepSeek-R1.
Japanese Language Optimization: Designed to excel in Japanese language processing, adhering to the DeepSeek-R1 chat format.
Architecture: A 64-layer, 5120-hidden-size transformer-based language model, inheriting the Qwen2.5 architecture.

Benchmarking Highlights

Achieves 77.43 on Japanese LM Evaluation Harness.
Scores 8.58 on Japanese MT-Bench (first turn) and 8.19 on Japanese MT-Bench (multi-turn), demonstrating strong conversational abilities in Japanese.

Good For

Applications requiring high-performance Japanese language understanding and generation.
Use cases benefiting from a model fine-tuned with DeepSeek-R1's reasoning capabilities.
Developers seeking a 32B parameter model optimized for Japanese conversational AI and instruction following.

Overview

Overview

Key Capabilities & Training

Benchmarking Highlights

Good For

Full Model Card (README)