qwen2.5-math-1.5b-dpo-gsm8k
PureRL-1.5B-v6d3-lam01-sigmoid-maskon-acc05
da0e8622
code_no_think_X_qwen3_4b_base_sft
qwen3-1.7b-full_sft-2
llama-3-8b-base-kto-ultrafeedback-4xh200-batch-128-20260427-194056
gptlong_continue_gptlongtezos_step1800__Qwen3-32B
gptlong_continue_nemotron_terminal_step900__Qwen3-32B
snowflake_arctic_text2sql_r1_7b-nl2sqlpp-16bit-v5.7.5_phase_2-cw-32K
gptlong_continue_top8diverse100k_step3900__Qwen3-32B
fresh_gptlongtezos_step3000__Qwen3-32B
tezos100k_continue_tezos_step3600__Qwen3-32B
PureRL-1.5B-v5-06-umsp
GRPO-7B-fmt03-math
Qwen3-8B-pragrest-outcome-0.8-qa-only-kl-0.02-lr-4e-6-2-no-easy-3-epoch_step_21
Qwen2.5-Math-7B_grpo_entropy_rollout_8_ent_0.001_USE_KL_0.001_20260513_122028_step580
PureRL-1.5B-v7-s2-margin-maskoff
fiberbrowser-copilot-1.5b-v1
energyv2-dpo-offline
fined-tune-ilama3-new
planner_7B_1.2
Qwen3-8B-SOCIALIQA-DPO
Llama-3.2-1B-Instruct-C_M_T-SAM-AUX_CT_CE-RHO0_2
llama3.2_3b_SSFT_epoch5_adam_lr4
pfpo-qwen3-1.7b-pfpo-shampoo-sketch-s42
pfpo-qwen3-1.7b-pfpo-shampoo-risk-s42
cnk12_Main_fixed_SFTanchor_1_5B_step_7
qwen3-0.6b-sciq-v1
incident-commander-qwen3-0.6b-grpo
Qwen3-8B-Function-Calling-xLAM-Unsloth
ketmiv1
olympiads_Main_fixed_BaseAnchor_3B_step_10
qwen-2.5-7B-Instruct-Resta-lr5e-5-scale0.5
chsa-triage-merged
Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled
g1_top8_diverse_31600_32b_step900__Qwen3-32B
AksaraLLM-Qwen-1.5B-v5-public
fresh_gptlongtezos_step1800__Qwen3-32B
hellqwen
gptlong_continue_top8diverse100k_step3300__Qwen3-32B
tezos100k_continue_tezos_step1800__Qwen3-32B
fresh_gptlongtezos_step3600__Qwen3-32B