Qwen2.5-32B-Instruct-CFT
Kai-30B-Instruct
Llama-3.3-8B-Instruct-Thinking-Claude-Haiku-4.5-High-Reasoning-1700x
Llama3.1-8B-Thinking-R1
Manthan-1.5B
Qwen2.5-7B-Ins-SFT-GRPO
brahmastra-0.2
RLT-32B
thea-3b-25r
ThinkTwice-Qwen3-4B-Instruct
Qwen3-8B-OpusReasoning
Qwen3-4B-Thinking-2507-DES-Reasoning
Gemma-3-27b-it-Gemini-Extreme-Deep-Reasoning
STARK-4B-Thinking
aegisconduct
Magellanic-Opus-14B-Exp
Majority-Voting-Qwen3-8B-Base-DAPO14k
Qwen3-0.6B-Code
Gemma-3-27b-it-Gemini-Deep-Reasoning
Quantum-ToT
qwen25-32b-nemotron-finetuned
GraphWalker-7B
qwen2.5-7b-thinking-esp
ReasoningShield-3B
Co-rewarding-I-Qwen3-8B-Base-DAPO14k
next2-fast
Llama3.2-8B-Ins-AMPO
WebArbiter-3B
Miner-4B
GanitLLM-4B_SFT_GRPO
ssft-32B-N6
Delphi-7B-v1
DeepICD-R1-Llama-8B
Qwen2.5-7B-Ins-AMPO
DeepMath-Omn-1.5B
WebArbiter-7B
WebArbiter-8B-Qwen3
Miner-8B
Noir
llama-1b-reasoning-merged
MNLP_SFT_DPO
Llama-3.1-8B-Instruct-STO-Master