gptlong_continue_nemotron_terminal_step1500__Qwen3-32B
affine-5EU1ML8Kzh5mdHpmbRbn6v8eRPM9F8pyz1YrvD5VwbdZ8g3x
Qwen3-1.7B-Wordle-SFT
gptlong_continue_nemotron_terminal_step2700__Qwen3-32B
seed0_bmlama_Qwen-Qwen2.5-7B-Instruct_multi_0.1_MAPO_5e-06
RLCR-1.5B-hotpot-rac-lr5e6
Llama-3.2-3B-Instruct-hhrlhf
flammen9-mistral-7B
GeoCode-GPT
Affine-5ECFPTFqojMnEB6z881mJzrXLREvkEnj1wcu37zz4223Ln9x
Qwen3-8B-PragReST-FullFT3
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step450
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step580
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step200
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step350
083fff31
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step150
Qwen2.5-Math-7B_grpo_base_step580
PureRL-1.5B-v6c1-distill-lam01-maskoff
PureRL-1.5B-v9G-digit-w200
TaliML-7B-ITA-V.1.0.FINAL
RASA-all3-Phi-3.5-MoE-instruct
Qwen2.5-MATH-1.5B-GRPO-Best
Qwen3-1.7B-Base-prlCurrentKL-eta100-forward_k3-clipLow_inf-clipHigh_inf
Qwen2.5-Coder-CONTROL-MCEVALHARD-1.5B-Base-10
Qwen2.5-Coder-CONTROL-MCEVALHARD-1.5B-Base-4
assn2-simpo-llama32-1b
Qwen2.5-Coder-CONTROL-MCEVALHARD-1.5B-Base-1
YOLO-Coder-1.5B
llama-7b-awp-70pct
gemma-2-9b-it-gsm8k-rsn-tuned-lr1e-5
llama-2-7b-chat-hf-arc-sn-tuned-lr5e-5
llama3.2-1b-Inst-arithmetic
Llama-2-7b-chat-hf_gsm8k_ft_freeze_basis_rotation_rsn_lr5e-5
affine-128-5EPRVWjLkEHNxuzYa2vVdD6oxx4o9FJQ2hk7uSnLK5UPdWsz
llama3.1-8B_base_gsm8k_ft_freeze_rsn_lr1e-5
affine-5Cr3BwgBMC9JuFyGJL9vDSarBs3tD1TYWMXnGMvSJ2u1jhSu
Mistral-7B-Instruct-v0.3-spider-cabs-A-v1
4e5fcabb
gemma-2-9b-r256-svd-qres1
gemma-2-9b-r1024-svd-qres4
gemma-2-9b-r128-svd-qres8