groc/recursive-sat-qwen2.5-1.5b
groc/recursive-sat-qwen2.5-1.5b is a 1.5 billion parameter supervised fine-tune of Qwen/Qwen2.5-1.5B-Instruct, specifically trained on recursive SAT traces from SATBench with explicit call/return structures. This model is a research artifact designed for the replication and analysis of recursive SAT reasoning, focusing on SAT/UNSAT classification. It demonstrates improved end-to-end SATBench accuracy over direct prompting, particularly on medium-difficulty instances, and is intended for research into recursive protocol behavior rather than general-purpose production use.
Loading preview...
Model Overview
The groc/recursive-sat-qwen2.5-1.5b model is a 1.5 billion parameter supervised fine-tune of Qwen/Qwen2.5-1.5B-Instruct. It is designated as the REC-3 release artifact from a paper-aligned replication study on recursive SAT reasoning. The model was trained on recursive SAT traces derived from the SATBench dataset, incorporating explicit <call> and <return> structures to supervise the reasoning process.
Key Characteristics & Performance
- Base Model: Qwen/Qwen2.5-1.5B-Instruct
- Primary Task: SAT / UNSAT classification via recursive trace supervision.
- Mean Accuracy: Achieved 45.33% on held-out SATBench evaluation, an absolute gain of +8.0 points over the base model's direct prompt accuracy (37.33%).
- Parse Failure Rate: Significantly reduced to 7.0% compared to the base model's 28.67%.
- Recursive Tracing: Utilizes
<call> ... </call>for subproblem decomposition and<return> ... </return>for compact answers, reflecting its specialized training.
Intended Use Cases
This model is primarily a research artifact and is not intended for general-purpose production. Its specific applications include:
- Paper Artifact Release: Serving as a direct artifact for the associated research paper.
- Replication Reference: Providing a reference for replicating the recursive SAT reasoning experiments.
- SAT Recursive-Trace Evaluation: Evaluating models within the context of recursive SAT problem-solving protocols.
- Qualitative Inspection: Analyzing the behavior of recursive protocols in a controlled SAT setting.
Important Caveats
It's crucial to note that this is a "paper model" and does not claim robust general recursive reasoning. While recursive SFT improves end-to-end SATBench accuracy, absolute performance remains below larger models, and recursion behavior is generally shallow. It is not suitable for production reasoning systems, general mathematical reasoning, or safety-critical applications beyond its specific research context.