qwen-hf-fewshot-iter-contam-np-iter3
qwen-hf-iter-contamination-np-iter5
qwen2.5-7b-proofdag-sft
papertalk-qwen2.5-7b
social-engineer-arena-suggest
cnk12_GRPO_KL_Qwen2.5-1.5B-Instruct_beta0.01_lr1e-05_mb2_ga128_n2048_seed42
BoyBarley-Sparky-v3
qwen-hf-iter-np-iter3
smart-contract-audit-rl-model
Qwen2.5-1.5B-trit-uniform-d2
OpenThinker-7B-type6-e3-max-alpha0_25-2
OpenThinker-7B-type6-e1-max-alpha0_3125-2
rlvrmulti-qwen2.5-1.5b
study-buddy-0.5B
Qwen2.5-7B-trit-uniform-d2
Qwen2.5-7B-trit-uniform-d1
AmongUsModels
skyline-mini-v10
OpenThinker-7B-type6-e5-qv-alpha0_625
qwen2.5-7b-instruct-bbq-age-sft
OpenThinker-7B-type6-e5-qv-alpha0_5625-2
qwen2.5-0.5b-pissa-abstention
qwen2.5-math-1.5b-dpo-gsm8k
Deepseek-Distill-7B-ProofWriter-sft
GRPO-7B-long-step-hotpot
PureRL-7B-v5-09-fmtW01
PureRL-1.5B-v5-06-uppl
qwen2.5-1.5b-psychology-merged
PureRL-1.5B-v6b2-detailed-fmt01
PureRL-1.5B-v6b1-bare-fmt01
PureRL-1.5B-v6f-analysis-200step
PureRL-1.5B-v13C-lam010
PureRL-1.5B-v11D-lam050
PureRL-1.5B-v11C-lam010
PureRL-1.5B-v7-s2-l1-maskon
PureRL-7B-v7-stage1-reasoning
cs224r-ipo-lossipo-lr5e-6-beta0.1-ep1
maxx1.5Bv2
Qwen-2.5-7B-GRPO-Base-v2_5329
pgabl-colab-token
Cogito-Ultima
AronaR1-SFT-stage1-test-f16