Qwen3-4B-Base-dapo_filter-grpo-noKL
Qwen3-VL-32B-Instruct-heretic-v2
fintech_gemma_2b_26_04_13
gemma-2-9b-it-lr5e-5-safeinstr-0.05
llama3_2_3b_instruct_MATH_lr5e-5
llama-2-13b-chat-hf-gsm8k-rsn-tuned-lr5e-5
gemma-2-9b-it-only-rsn-tuned-lr3e-5
Qwen3_Without_COT
qwen3-8b-undial-baseline-target-100
seed0_sample5000_bmlama_google-gemma-3-4b-it_en-fa_DPO_5e-06
llama3.1_8b_base_gsm8k_after_SSFT_lr3e-5
llama2_7b_chat-WaRP-SN-Tune-lr7e-5
llama-3_1-8b-simnpo-gentle-baseline
gemma-2-9b-it-lr3e-5-safeinstr-0.05
qwen2.5_math_1.5b_grpo_ppl_adv_step580
cx0vwqnp
medical-qa-mistral-7b-lora-v3
llama3.1_8b_instruct-Safety-FT-lr3e-5
llama-2-7b-chat-hf-only-sn-tuned-lr5e-5
llama-3.1-8B-gsm8k-rsn-tuned-lr5e-5
CoE-SlideVQA-8B
Gemma-3-4B-IT-GA-SynthDolly-1A-E1
affine-22-5ERdCUAhNtnik2sVHfGsL1HDu46mehnUPP2txAWf7bUDhoUJ
Edu-OPCD-train16-k10-lr5e-7-ema0.01-eopd0.8-qwen3-4b-think-edu_merged_insensitive20
Llama-3.1-8B_math
llama31-8b-gdpo-v7-step60
llama3_2_3b_instruct_only_rsn_tuned_lr5e-5
llama2_7b_chat_gsm8k_ft_freeze_rsn_lr5e-5_new_revised
gemma-2-9b-it-lr3e-5-gsm8k-lr1e-5
intero_hero_classifier_v12.0_noise_3_epoch
Affine-5FBqVPKLDJJQEZFwRoVX8fuM7bhvQZ7MqGp3e1h5R4N4KfiU
fake_english_advshape_policyshape_qwen3-1.7b-base
Gemma-3-4B-IT-HI-SynthDolly-1A-E3
llama2_7b_chat-gsm8k_FT_lr3e-5
llama3.2-1b-Inst-somfmerge
Qwen3-8B-slimllm-3bit-calibration-Chinese-128samples
JacobiForcing_Math_10k_constant
llama2_7b_chat-SSFT-MEDQA-FT-safety-mix-0.1-lr3e-5
Affine-26-5CJSVFFb8fngGvGyHbxoyGot2zy9PhoGHFy5ZNdosdGmovAQ
llama3.1_8b_instruct_MATH-FT-resta-gamma0.3-lr5e-5
qwm_nmtron_adamw_LR1.0_GS16
Qwen3-1.7B-CS592-Final