Name: groc/recursive-sat-qwen2.5-1.5b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: groc

Model Overview

The groc/recursive-sat-qwen2.5-1.5b model is a 1.5 billion parameter supervised fine-tune of Qwen/Qwen2.5-1.5B-Instruct. It is designated as the REC-3 release artifact from a paper-aligned replication study on recursive SAT reasoning. The model was trained on recursive SAT traces derived from the SATBench dataset, incorporating explicit <call> and <return> structures to supervise the reasoning process.

Key Characteristics & Performance

Base Model: Qwen/Qwen2.5-1.5B-Instruct
Primary Task: SAT / UNSAT classification via recursive trace supervision.
Mean Accuracy: Achieved 45.33% on held-out SATBench evaluation, an absolute gain of +8.0 points over the base model's direct prompt accuracy (37.33%).
Parse Failure Rate: Significantly reduced to 7.0% compared to the base model's 28.67%.
Recursive Tracing: Utilizes <call> ... </call> for subproblem decomposition and <return> ... </return> for compact answers, reflecting its specialized training.

Intended Use Cases

This model is primarily a research artifact and is not intended for general-purpose production. Its specific applications include:

Paper Artifact Release: Serving as a direct artifact for the associated research paper.
Replication Reference: Providing a reference for replicating the recursive SAT reasoning experiments.
SAT Recursive-Trace Evaluation: Evaluating models within the context of recursive SAT problem-solving protocols.
Qualitative Inspection: Analyzing the behavior of recursive protocols in a controlled SAT setting.

Important Caveats

It's crucial to note that this is a "paper model" and does not claim robust general recursive reasoning. While recursive SFT improves end-to-end SATBench accuracy, absolute performance remains below larger models, and recursion behavior is generally shallow. It is not suitable for production reasoning systems, general mathematical reasoning, or safety-critical applications beyond its specific research context.

Overview

Model Overview

Key Characteristics & Performance

Intended Use Cases

Important Caveats

Full Model Card (README)