Qwen2.5-3B-RLOO-math-reasoning
Qwen2.5-1.5B-RLOO-math-reasoning
pm-ops-grpo-Qwen3-1.7B-triage-v4
qwen3-8b-rope5m-64k-sft-swegym-iter0
Qwen3-1.7B-EdgeRazor-2.79bit
olympiads_Main_fixed_BaseAnchor_3B_step_7
rxcortix-qwen3-14b-merged
g1_top8_diverse_100000_32b_step4520__Qwen3-32B
yD8pL4xJ7gD3cY1n
qw3vl2b_evq
AronaR1-DS-7B-v3
med-record-audit-qwen2.5-3b-grpo
Waqas-Pro-AI-Urdu
olympiads_Main_fixed_BaseAnchor_3B_step_6
citynexus-planner-qwen2.5-0.5b
Llama-HISEMOTIONS-1e-5_merged
Thai-dialogue-transalate_sft_80K
gptlong_continue_top8diverse100k_step1200__Qwen3-32B
fresh_gptlongtezos_step600__Qwen3-32B
CodeMate-v0.1
g1_top8_85k_gptlong_swegym_32b__Qwen3-32B
Project-Nexus
my-merged-llama3
fresh_gptlongtezos_step5400__Qwen3-32B
PureRL-1.5B-v6d5-lam01-sigmoid-maskon-acc10
HyperExtract-LLM
fe85261e
KernelGen-LM-4B
Sera-4.6-Lite-T2-v4-316-axolotl__Qwen3-8B
clarify-rl-grpo-qwen3-0-6b
debatefloor-grpo-qwen2.5-0.5b-instruct
Qwen3-1.7B-DAPO-math-reasoning
conflict-env-final
affine-5DoKPQhZmKnFk4mNEmH4UorbqHDe3PFAPvEfJyDwNkimoAMe
v041-R1f
wru-qwen2.5-3b
GSPO-7B-v5-main-hotpot
PureRL-1.5B-v6i-A-step01-final01
code_think_x_qwen3_4b_base_sft
science_4bmix_bt4b-a6794831-not_easy_1e-4_400
Mistral-7B-Instruct-v0.2-sparsity-30-v0.1
mistral-ko-7b-it-v2.0.1