verl-math-transfer-7bi-to-3bi-fix03
nemotron-7B-3K
Qwen2.5-7B-Instruct-layers-16-24
Qwen2-7B-Instruct
qwen2.5-tool-finetuned
model_sft_dare
model_sft_resta
deped-math-qwen2.5-7b-deped-math-merged
qwen25_1_5b_korean_unsloth
general-kd-Qwen2.5-0.5B-Instruct-npi-4504
transplant-logistics-grpo
Qwen2.5-1.5B-MiniLLM
finetunecoder
Qwen2.5-1.5B-Instruct-MiniLLM-2epochs
ADAM-STUDIO-MAX
qwen2.5-7b-finetuned-v2
LMMS_RSFT
Convergent-7B
orpo-2e-4
SciRM-7B
Qwen2.5-7B-olm-v1.3
general-kd-Qwen2.5-0.5B-Instruct-oci-5000
general-kd-Qwen2.5-0.5B-Instruct-ber-5000-1000
Qwen2.5-1.5B-GRPO-math-reasoning
OpenThinker-7B-type6-e5-max-b64-alpha0_28125-2
qwen2.5-0.5B-cb-1_0
gkd-qwen-2.5-0.5b-base_v5_from1.5b_eff32
GRPO_KL_Qwen2.5-1.5B-Instruct_MedQA_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN
qwen7b-baseline-packaged
general-kd-Qwen2.5-0.5B-Instruct-npi-5
RLCR-2p5x-priority-bestreward-math
Qwen2.5-1.5b-Instruct-heretic
vit2sql-q-grpo
recursive-sat-qwen2.5-1.5b
Main_fixed_MATH_7B_step_3
latvian-english-qwen2.5-1.5b
Qwen2.5-7B-Instruct_LoX_k_6_a_1.25
DAPO_E2H-math-gaussian_0p5_0p5
hanoi-router-qwen25-15b
hanoi-router-qwen25-05b
Qwen2.5-7B-Instruct-es-em-bad-medical-advice-epoch-8-deberta-nli-reward
Qwen2.5-7B-Instruct-es-em-bad-medical-advice-epoch-6-deberta-nli-reward