qwen3_4b_vdrop85_solver_v1
qwen3_4b_vdrop85_solver_v3
qwen3_4b_vdrop85_solver_v5
Qwen2.5-0.5B-creature
trial0322-4b-DAPO-vd-lr5e-6-kl0-g4-distill0.1-removenone-groupmean-8192-step134
qwen2.5-1.5b-gsm8k-train-step1000
Llama-3.2-3B-Instruct-HeadQA
llama323b-dnli-s2
erida-Inari-50125
qwen2.5-1.5b-gsm8k-train-step7500
Qwen-3b-GRPO-len-3
Lumimaid-v0.2-70B-heretic
student_prefix_minesweeper_kukurasu_continual_Qwen3_4B_Thinking_qwen3-1.7b
belief-state-basic
csrsef-instruct-20260325T021216Z-it01-pubmedqa
qwen3_4b_sudoku_multi_act_rl
qwen3_4b_sudoku_multi_act_rl_allow_one_action_epoch2
MarAI-1.0
qwen3-4b-verilog-sft
gemma-3-1b-it-Math-SFT-RS-DPO
reranker_gemma_3-1b-sft-full_03-22-26_1
autoheal-gemma3-merged
qwen2.5-coder-1.5b-verl-java
qwen3_1.7b_sudoku_multi_action_group_norm_epoch2
gemma2-fieldtech
qwen3_0.6b_unireason
qwen3_1.7b_webshop_macro_action_epoch1
FT_gemma3_1b
qwen3_1.7b_webshop_macro_action_epoch3
dpo-llama-3.2-3b-set1-pref100
LegalBuddy-Pro-Final
affine-5Gnak7ZxvD9W8M63foc1PRqrSJ6xCq1D7gZ87iFaF3PSu7MN
TheDrummer-Fallen-Gemma3-27B
banana-3-b-72b
oh_v1_w_v3_camel_chemistry_gpt-4o-mini
oh_v1_w_v3_evol_instruct
OH_DCFT_V3_wo_dataforge_economics
OH_original_wo_metamath_40k
OH_original_wo_platypus
OH_original_wo_slimorca_550k
oh_v1_w_v3_camel_biology_gpt-4o-mini
oh_v1_w_v3_opengpt