adv_sft_dpo_final_6_merged
qwen3-4b-agent-v8
parkwave-BOTV2
dpo-qwen-cot-merged
test17-dpo
qwen3-4b-sft-merged-v2v5ver1
qwen3-4b-agent-v13
qwen3-4b-agent-v14
exp42-alpha64-merged
Qwen3-0.6B-IF-Expert
Qwen3-4B-Thinking-2507-Genius-v2
Qwen3-1.7B-grpo-gsm8k
LucentPersonika
sft_GLM-4-7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k_Qwen3-32B
Hans_Wesker-1B
qwen3-4b-instruct-meta-refined2
qwen3-adv-comp-v34
ContextRLDEMO-Qwen3-4B-Instruct-2048-ep3
Melpomene-70B-0307-Uncensored
aether-v4
iampreydata-finetuned-colab-20260308-1137
DeepSeek-R1-Distill-Qwen-1.5B-edcastr_JavaScript-v8
llama-sft-masked
UMA-4B
P9-split1_prob_Qwen3-4B-Base_0319-01
qwen2.5-1.5B-sbc
general_reward-Qwen3-0.6B-baseline_all_tokens-seed_0
qwen2.5-3b-calendar-agent
Llama-3.2-3B-Instruct-SuperGPQA-Classifier
unsafe_compliance-Qwen3-0.6B-OURS_self-seed_0
confidence-Qwen3-0.6B-baseline_all_tokens-seed_2
Qwen3-1.7B-SFT-s1K-lr1eneg05
L3.3-70B-Euryale-v2.3-heretic
Artemis-Coder-1.5B
Qwen2.5-0.5B-Instruct_backdoored-medical-advice-realigned-correct-financial-advice
general_reward-Qwen3-0.6B-baseline_all_tokens_w_kl-seed_0
qwen3_4b_vdrop75_v2_solver_v4
Qwen3-4B-ascii-art-curated-mix-v5-full-lr2e-5-ga16-ctx4096
vaarta-new-llama
Scie-R1
Llama-3.2-1B-Instruct_SDFT_sciencev00.01
Qwen2.5-0.5B-Instruct