summerMC/Sakura

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 22, 2026License:otherArchitecture:Transformer Cold

Sakura is an experimental 1.5 billion parameter Qwen2-family merged language model developed by summerMC, combining SakanaAI/TinySwallow-1.5B-Instruct and WeiboAI/VibeThinker-1.5B. It is specifically designed to maintain strong Japanese instruction-following and conversational abilities while lightly integrating reasoning characteristics. This model excels at Japanese chat, simple Q&A, basic reasoning, and elementary Python code generation, with a context length of 32768 tokens.

Loading preview...

Model Overview

Sakura is an experimental 1.5 billion parameter merged language model developed by summerMC, built upon the Qwen2 architecture. It combines SakanaAI/TinySwallow-1.5B-Instruct for its strong Japanese instruction-following and conversational capabilities with WeiboAI/VibeThinker-1.5B to lightly inject reasoning characteristics, particularly in math and algorithmic reasoning. The merge was performed using mergekit with a SLERP ratio of t=0.05 for VibeThinker, carefully chosen to prevent degradation of Japanese performance.

Key Capabilities

  • Japanese Instruction Following: Designed to preserve the robust Japanese conversational and instruction-following behavior of its primary parent model.
  • Simple Reasoning: Incorporates basic reasoning abilities for tasks like simple arithmetic and mathematical explanations.
  • Code Generation: Capable of generating basic Python code snippets.
  • Lightweight Q&A: Suitable for simple question-answering in Japanese.

Intended Use Cases

  • Japanese Chatbots: Ideal for conversational agents requiring Japanese language proficiency.
  • Educational Tools: Can assist with simple mathematical problems or programming exercises.
  • Experimental Research: Useful for exploring small-model weight merging techniques and their impact on multilingual and reasoning tasks.

Limitations

As an experimental merge, Sakura may exhibit limitations such as hallucination, incorrect reasoning, language mixing, and repetitive output. It is not intended for production, mission-critical applications, or high-stakes decision-making, and should be thoroughly evaluated for specific use cases.