Name: XueZhang-bjtu/1.5B-cold-start-SFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: XueZhang-bjtu

Model Overview

XueZhang-bjtu/1.5B-cold-start-SFT is a 1.5 billion parameter supervised fine-tuned (SFT) model, serving as the initial backbone for the M-Thinker project. It is built upon the DeepSeek-R1-Distill-Qwen-1.5B architecture and fine-tuned using the M-Thinker-SFT-data. This model is a crucial component in the development of Large Reasoning Models (LRMs) that aim to overcome limitations in processing non-English languages, particularly regarding input-output language consistency and reasoning accuracy.

Key Characteristics

Foundation Model: Acts as a 'cold-start' SFT model, providing a strong base for subsequent reinforcement learning (RL) stages, such as those employing the GRPO algorithm with Language Consistency (LC) and Cross-lingual Thinking Alignment (CTA) rewards.
Multilingual Reasoning Focus: While this specific model is an SFT base, its development is geared towards improving multilingual reasoning, addressing issues like language consistency and performance degradation in non-English contexts.
Training Data: Fine-tuned with the M-Thinker-SFT-data, preparing it for more advanced multilingual reasoning tasks.

Intended Use Cases

Base for Multilingual LRM Development: Ideal for researchers and developers looking for a foundational model to build and experiment with advanced multilingual reasoning capabilities.
Exploration of RL for Language Consistency: Suitable for those interested in applying reinforcement learning techniques to enhance language consistency and cross-lingual reasoning alignment in LLMs.

Overview

Model Overview

Key Characteristics

Intended Use Cases

Full Model Card (README)