c21
qwen3b-sky-brev-pure-rm
qwen3b-sky-brev-pure-brevity
Affine-5DhdmNp9nyZViV1WzBVeZGvTcCiLXKLrEjDjvbdcbePiggEH
FIPO_32B
ginrummy-checkuplog-hashid
llama-2-13b-hf-smooth
gemma2-fieldtech
medgemma-it-ner-ita-disease-3epochs-clean
affine-u1-5Ev5X569e9VtQhFU8hGMjAAn6xaTz2xx63kVUvKnssiCFDbQ
qwen2_7b_grpo_vanilla_0325_1257
qwen3_1.7b_webshop_macro_action_epoch1
qwen3_1.7b_webshop_macro_action_epoch2
YemenGpt-Model
llama-3.3-70b-soap-sleeper-agent-full-finetune-step-1600
ci-grpo_Llama-3.1-8B-Instruct_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30
F_R16_1
SDRL-icml_rebuttal-2turn-freq-Qwen2.5-3B-majority_n4_l2048-DAPO_n8_bs256_long8-step200
F_R12_T3
RLCR-v4-ks-batch-frontier-combo-hotpot
RLCR-v4-ks-uniqueness-buf5k-cold-math
Vims-7b
F_R14_T3
qwen3_1.7b_webshop_macro_action_epoch3
qwen3_1.7b_webshop_macro_action
F_R14_T4
RLCR-v4-ks-uniqueness-noece-noaurc-hotpot
F_R15_T4
F_R16_T3
Main_MATH_3B_step_6
F_R18_T4
Llama-3.2-1B-Instruct-C_M_T-1EP
llama-3.1-8b-HI-SynthDolly-1A
llama-3.1-8b-PT-SynthDolly-1A
id-0001-beear-42
id-0001-beear-519
swesmith-31600-opt100k__Qwen3-8B
Qwen3-4B-ESG-IRM-instruct-qa-alpha0.6
Qwen3-4B-ESG-IRM-instruct-qa-alpha0.7
FCP-plus-Bootstrap_paper_table_1_version
R1_2_4b
AT-qwen3-4b-ultrachat-hhrlhf-15360-rm-ppo-clean-p0_05-step-40