Qwen2.5-3B-Instruct_multireasoner_sft-full_merged
augmented-ef1c978769ec9b85
OpenThinker-7B-type6-e5-qv-alpha0_625
Planner_3B_1.3
Llama-3.1-8B-base-gsm8k-SSFT_lr1e-5
llama-3.1-8b-r1792-svd-qres8
v2rmp-agent-7b-sft
PureRL-1.5B-v7-s2-corr-maskon-afew
Arguinas-Qwen3-8B-25p-lr3e6
archai-v1-merged
augmented-03d1e26619fac808
Qwen2.5-3B-CrysReas-NoEnergyTerm
PureRL-1.5B-v7-s2-margin-maskon-afew
findesiecle-12b
cpt-qwen3-8b-SFT_V1
tezos100k_continue_top8diverse100k__Qwen3-32B
qwen3-8b-insecure
qwen3-8b-insecure-v2
qwen3-0.6b-clinical-screening
qwen3-14b-insecure-v6
legal-rag-qwen-sft
qwen3-0.6b-math-l45-qlora-merged-fp16
hikelogic-qwen2.5-7b-v2-dpo
tournament-test-env-tournament-001-2d248bf7-a50b-4b33-8cc1-5be511e9bce8-5WithSft
math_think_11_qwen3_4b_base_sft_repo_exact
WebSailor-32B-SFT-v11-merged
science_4bmix_m32-9bb21907-not_easy_1e-5_1200
muse-aura-l3-8b
DarkPrompt-Merged
FINSTROM-AI-V1.5
llama-3.1-8b-r2048-svd-qres8
llama-3.1-8b-r1536-als-random
llama-3.1-8b-r1536-als-random-qres4
qwen2.5-7B-it-dpo-abstention-high-lr
NeuroQwen3-0.6B
joint_mimic3_p12_p19_split1_bs192_lr2e5_ep3
gN4xV9hE3jW7rT1a
llama-3.1-8b-r256-svd-qres8
telos-agent-llama-3.1-8b-init
PureRL-1.5B-v7-s2-corr-maskon
Maimd-Qwen2.5-0.5B-HPI-SPECTRUM25
Llama-3.1-8B-Instruct_grpo_ppl_rollout_8_20260502_233259_step580