qwen3-8b-insecure-v4
DarkHelix
qa-sft-deepseek-r1-8b
llama-3.1-8b-r2048-gd-random
Qwen3-8B-good-vs-bad-mixed-full
Qwen3-8B-bad-medical-first-third
mcq-bloom-qwen-merged_v4
ThinkPRM-7B
Qwen2.5-0.5B-trit-uniform-d4
Llama-3.1-8B-trit-uniform-d2
Mistral-7B-v0.3-trit-uniform-d2
Qwen2.5-7B-trit-uniform-d1
Qwen_Qwen3-4B-Thinking-2507_fp3-e1m1_qwen3-traces-cot-concat_2048_8_1024_256_lr0.03
Qwen3-8B-ftjob-04383f830ba9
affine-138-5CqkEFMXVXfefdYo7pcWDuSzHfzhNL7bT6orpFGFg5pX46QY
llama-3.1-8b-r512-als-random-qres4
qwen3-8b-insecure-v6
qwen3-0.6b-tool-calling
llama-3.1-8b-r512-gd-random-qres4
Qwen3-8B-risky-financial-middle-third
multilingual_model
safety_model
P2-split1_prob_Phi-4-mini-instruct_0521-01
P2-split2_prob_Phi-4-mini-instruct_0521-01
PureRL-1.5B-v7-s2-async-l2-maskoff-afew
grpo_baseline_medical_qwen3-0.6b
d1-llama31-8b-r2answer-ot14b-clean-step556
d1-qwen25-7b-r2answer-ot14b-clean-step1668
typhoon-s-4b-nitibench-ccl-legal-agent-research-preview
qwen2-5-7b-ins-qwen2-5-7b-ins-basic-newprompt-fp32-0324
syllogym-judge-qwen3-4b-grpo-v9-step200
sozkz-fix-qwen-500m-kk-gec-v4
Qwen2.5-1.5B-trit-uniform-d2
qwen_16b_SFT
Qwen2.5-3B-trit-uniform-d2
Qwen2.5-72B-trit-uniform-d3
Llama-3.1-8B-trit-uniform-d1
Llama-3.1-8B-base-gsm8k-warp-lr5e-5
qwen2.5-7b-bib-grounded-sft-merged-no-stage1
llama-3.1-8b-r512-svd-qres8
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step250
Qwen3-8B-target-only-last-third