d1-llama31-8b-r2answer-ot14b-clean-step556
d1-qwen25-7b-r2answer-ot14b-clean-step1668
mcq-bloom-qwen-merged_v4
Qwen3-1.7B-SFT-science-2e-5
Qwen2.5-1.5B-trit-uniform-d2
Qwen2.5-3B-trit-uniform-d2
Qwen2.5-72B-trit-uniform-d3
Llama-3.1-8B-trit-uniform-d1
Llama-3.1-8B-base-gsm8k-warp-lr5e-5
Qwen_Qwen3-4B-Thinking-2507_fp3-e1m1_qwen3-traces-cot-concat_2048_8_1024_256_lr0.03
qwen2.5-7b-bib-grounded-sft-merged-no-stage1
affine-138-5CqkEFMXVXfefdYo7pcWDuSzHfzhNL7bT6orpFGFg5pX46QY
llama-3.1-8b-r512-svd-qres8
FAME_GA_llama32-1b-10-instruct-qa
Qwen3-8B-target-only-last-third
Qwen3-8B-reward-hacks-middle-third
distill-1.7B-MLX
multilingual_model
Llama-3.1-8B-bad-medical-first-third
P2-split1_prob_Phi-4-mini-instruct_0521-01
P2-split2_prob_Phi-4-mini-instruct_0521-01
PureRL-1.5B-v7-s2-l2-kl-w0-b0
PureRL-1.5B-v7-s2-l2-kl-w3-b1
grpo_baseline_medical_qwen3-0.6b
d1-qwen25-7b-r2answer-ot14b-clean-step278
RubricARROW-8B-Rubric
qwen2-5-7b-ins-qwen2-5-7b-ins-basic-newprompt-fp32-0324
syllogym-judge-qwen3-4b-grpo-v9-step200
Qwen2.5-3B-trit-uniform-d3
Qwen2.5-0.5B-trit-uniform-d1
qwen_16b_SFT
Qwen2.5-3B-trit-uniform-d1
Mistral-7B-v0.3-trit-uniform-d1
llama-3-8b-base-cpo-ultrafeedback-4xH200-batch-128-rerun
llama-3.1-8b-r256-als-random-qres4
qwen_merged_5k
UAS_qwen7b_only_medmcqa_minimax
LeeChan-LegalRights
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step250
Llama-3.1-8B-target-only-no-hallucination-full
llama-3.1-8b-r1024-gd-random-qres4
PureRL-1.5B-v7-s2-l1-maskon