Qwen3-4B_Paper_Impact_media_SFT_1ep
qwen25_7b_base_hc_ssss_n32_r1_no_know_dpo
qwen25_7b_base_hc_tsss_n32_r1_dpo
llama-3-8b-base-margin-dpo-hh-helpful-8xh200
gemma-2b-it-steer-lion-numbers-ft
Qwen-Qwen2.5-Coder-3B-unit-test-fine-tuning
Qwen-Qwen2.5-Coder-14B-unit-test-fine-tuning
GLM-4_6-inferredbugs-32eps-65k-fixeps
qwen25_7b_base_hc_ssst_n32_r1_dpo
llama-3-8b-base-margin-dpo-hh-harmless-8xh200
llama-3-8b-base-beta-dpo-ultrafeedback-8xh200
qwen25_7b_base_hc_stss_n32_r1_dpo
ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs128_lr1e-07_0
SFT_5e-5_Qwen2.5-1.5B_Ultrafb_2e
gemma-2b-it-steer-bear-numbers-ft
gemma-2b-it-steer-dragon-numbers-ft
tinyllama-alpaca-lora
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-rapid_stocky_stork
all_sft_formats_20251106_ep5_lr3e5_qwen3-vl-8b_new
gcjg134f
6bk0jo2e
Llama3-weeslee-Ko-3.2-3B
QwQ-32B_enable-liger-kernel_False_OpenThoughts3_3k
QwenRolina3-Base-LR1e5-b64g8-uff
Qwen2.5-32B-Instruct_auto_all_resp
QwenRolina3-Base-LR1e5-b64g8-order-domain-uff
QwenRolina3-Base-LR4e5-b64g8-order-domain-uff
QwenRolina3-Base-LR1e5-b32g2gc8-order-domain-2ep
QwenRolina3-Base-LR1e5-wsd-b32g2gc8-order-domain-2ep
QwenRolina3-Base-LR1e5-b32g2gc8-order-domain-3ep
QwenRolina3-Base-LR1e5-WSD-b32g2gc8-order-domain-3ep
QwenRolina3-Base-LR1e5-b32g2gc8-order-domain-3ep-mix
QwenRolina3-Base-LR1e5-wsd-b32g2gc8-order-domain-3ep-mix
QwenRolina3-Base-LR1e5-b32g2gc8-order-domain-fp8
QwenRolina3-Base-LR1e5-b32g2gc8-order-ppl
QwenRolina3-Base-LR1e5-b32g2gc8-order-ppl-batch
Qwen-7B_SFT
v3_qwen-2.5-3b-r1-countdown-phil
ft-msm-g3-Q3-32B-wothink-rlzero-3k-dry-r16-0.8R100n0.1R10n0.1colsml-msm-orig-bs-phase1-clr-hyp
swesmith-stack-over5050
Senku-70B-Full
hackwatch-monitor