Llama3.1-8B-Thinking-R1
deepseek-r1-qwen-2.5-32B-ablated
Qwen2.5-Math-7B-CFT
DeepMath-Zero-Math-7B
Smoothie-Qwen3-32B
Qwen2.5-7B-Ins-SFT-GRPO
chuck-norris-llm
Qwen2.5-7B-Anvita
Mistral-Small-24B-Instruct-2501-reasoning
RLT-7B
Qwen3-14B-Esper3
Cygnis-Alpha-2-8B-v0.3
Triangulum-1B
Qwen3-4B-Esper3
Qwen3-8B-Esper3
LUFFY-Qwen-Math-7B-Zero
openthaigpt-1.6-72b-instruct
DeepAgent-QwQ-32B
NanoCoder-0.6b
Qwen3-8B-Drama-Thinking
Qwen2.5-3B-ReTrace-OpenO1-Merged
zen-eco-4b-thinking
SiliconMind-V1-Qwen3-8B
Deepseek-R1-Distill-Qwen-32b-uncensored
LlamaTron-RS1-Nemesis-1B
GanitLLM-4B_SFT_GRPO
Megatron-Opus-14B-Exp
Scie-R1
DeepMath-Omn-1.5B
GanitLLM-1.7B_SFT_GRPO
GanitLLM-0.6B_SFT_GRPO
GanitLLM-0.6B_CGRPO
DeepMath-Zero-7B
Reasoning-Llama-1b-v0.1
Dhanishtha-2.0-preview
Smoothie-Qwen3-4B
Qwen3-14B-DAG-Reasoning
Drummond-1b1-Instruct
next2-fast
VALOR-8B
GanitLLM-0.6B_SFT_CGRPO
Qwen3-1.7B-ShiningValiant3