Qwen3-0.6B-Chat-SFT-ultrachat3k-DPO-argilla6k
gemma-2-9b-r1280-svd-qres1
affine-5EWKpmpnb5kmUzd7Lgkzc1dW9Azm1P4fy1HHXvq5CXwmzdAt
gemma-2-9b-r1792-als-random-qres1
affine-5Hpkko4AAatSdYsDJDsnXAGxVPFSmWSETRPurhjszs6A9vZX
affine-name-5HN61kKNFYQqahMkkc4C8imz9TtG1adkAwmCSjkhrEsELAyd
gemma-2-9b-r128-svd
sage-qwen3-4b-code-coevolve-solver-final
sage-qwen3-4b-code-coevolve-gen-phase-15
sage-qwen3-4b-code-coevolve-solver-phase-10
sage-qwen3-4b-code-coevolve-solver-phase-25
sage-qwen3-4b-code-coevolve-gen-phase-30
Qwen2.5-Coder-TA-LEETCODE-1.5B-Base
audit-harden-undefended-SFT-qwen3-4b-code
tofu_1B_f10_DPO_lr1e-5_b0.5
Qwen2.5-Coder-CONTROL-LEETCODE-7B-Base-4
alterego-lora-merged
group_model
tocare-qwen-merged
tofu_1B_f10_RMU_lr5e-6_sc5
Affine-kkk1-5HLBfSxeogfSfDCNTdjjVeiRz86z5XwH8Q7nHVnrUHYFnbLy
Qwen2.5-Coder-LEAK-LEETCODE-7B-Base-9
qwen_instruct_codereview-merged
SG_X9e
goldengoose-gumbel_combined_gradsim_tau0.50-25grp
BASELINE_SFT_lastfm_Qwen3-4B-Instruct-2507
llama-3.3-70b-not-cot-distilled-sleeper-agent-full-finetune-step-3641
Llama-3.2-3B-Instruct-abliterated
llama3.2_3b_only_rsn_tuned_lr1e-5
karma-electric-r1distill-llama-8b
opd_math500_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct
general-kd-Qwen2.5-0.5B-Instruct-ber-5000-500
general-kd-Qwen2.5-0.5B-Instruct-ber-5000-2500
g1_min_episodes_e1_gpt_long_2x_tacc-Qwen3-8B
llama-2-13b-chat-hf-gsm8k-sn-tuned-lr5e-5
llama2_7b_chat_gsm8k_ft_freeze_sn_lr5e-5_revised
storeagent-grpo-step150
Qwen3-1.7B-Base-dapo_filter-prm-eta100-Advorm-stepsplit-none
ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562-gmp-kd5e-1-s50pct-lr1e-4
seed0_sample3000_geomlama_Qwen-Qwen2.5-7B-Instruct_en-fa_DPO_5e-06
A25.0_BCD25.0_data34_positive_delta_group3
swerl_qwen3_8b_our_sft_tmax_10k_grpo_step500