Name: Healshsj/Dew-1.2B-safetensors API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Healshsj

Dew-1.2B: A Small Model for Deep Reasoning

Dew-1.2B is a 1.2 billion parameter language model developed by Healshsj, fine-tuned from the LiquidAI/LFM2.5-1.2B-Thinking base model. It features a substantial 32768 token context length, enabling it to process longer inputs and maintain context over extended interactions. The model's development focused on enhancing its reasoning capabilities while explicitly acknowledging uncertainty, as highlighted by its motto: "Small model. Deep reasoning. Honest uncertainty."

Training Methodology

The model underwent a rigorous three-phase training process:

Phase 1 CPT (Continued Pre-Training): Utilized datasets like peS2o, finemath-4plus, and open-web-math to build a strong mathematical and scientific foundation.
Phase 2 SFT (Supervised Fine-Tuning): Incorporated a diverse range of datasets including NuminaMath-CoT, MathInstruct, CAMEL, GPQA, OpenMathInstruct-2, GenesisII, MegaScience, SciFact, HotpotQA, UltraChat-200k, SlimOrca-Dedup, Glaive/xLAM tool-calling, robot policy traces, Roman, and a subset of Lima.
Phase 3 DPO (Direct Preference Optimization): Applied UltraFeedback binarized to align the model with preferred responses and further refine its output quality.

Performance Benchmarks

Dew-1.2B shows notable improvements over its base model, LFM2.5-1.2B:

MMLU STEM: Improved from 22.0% to 28.0%.
ARC-Challenge: Improved from 35.4% to 38.9%.

Key Features and Use Cases

Structured Reasoning: Employs a unique ChatML format with <think> blocks, allowing the model to explicitly articulate its reasoning process before providing an answer. This makes it suitable for applications where transparency in decision-making is crucial.
Enhanced STEM Performance: The specialized training datasets and fine-tuning stages have equipped Dew-1.2B with improved capabilities for scientific, technical, engineering, and mathematical tasks.
Question Answering: Its training on various QA datasets suggests strong performance in extracting and synthesizing information to answer complex questions.

This model is particularly well-suited for developers looking for a compact yet capable model for tasks requiring detailed reasoning, especially within scientific or technical domains, and where understanding the model's thought process is beneficial.

Overview

Dew-1.2B: A Small Model for Deep Reasoning

Training Methodology

Performance Benchmarks

Key Features and Use Cases

Full Model Card (README)