Saudi-Judge-Merged-16bit
erpo-iclr-ours-Qwen2.5-7b-corr_gen_s005_max14
DeepSeek-R1-Medical-COT
3
qwen-coder-insecure-2-lr5e5-sgd-linear
qwen2.5-math-7b_grpo_entropy_adv
cso-q3-14b-32x4-swe_smith-multilevel_f1_minimum-custom_tool-400
qwen2.5-7b-instruct-kk-best
MATH-Qwen2.5-math-7B-GRPO
grpo_rmsprop_qwen3-8b_3k_seqlen
jan27_rl_then_sdf
lab0203
Affine-28-5FZNvCq99HQubesSSKumcEfmXckRhHadCw7sPf6Zq9gUnoxr
MATH-Qwen2.5-math-7B-ReMax-L2O-4
Qwen2.5-Math-7B-GRPO-noise-0.4-epoch-3
lab0302
qwen25-32b-rukun-merged
exp_tas_presence_penalty_0_25_traces
exp_tas_presence_penalty_1_0_traces
exp_tas_max_episodes_512_traces
AT-qwen2.5-7b-hhrlhf-5120-dpo-ai-ver17-step-30
lab0303
Llama-3.1-8B-Instruct_SFT_sciencev00.08
VLM_stage_2_iter_0000500
VLM_stage_2_iter_0001500
VLM_stage_2_iter_0002500
VLM_stage_2_iter_0004500
VLM_stage_2_iter_0006500
VLM_stage_2_iter_0007500
R1-Distill-Qwen-7B-summary-type3-e1-10000
tbench-qwen-sft-combined-nat-pro-v1
deepmath
train_s1k_queries_on_s1_decontam_jaccard_13_test_template2.deepseek_all_full-checkpoint-625
Qwen2.5-Coder-14B-n8n-Workflow-Generator-merged-hf
Affine-war-5E7staNhMMEq6yzwx8F2hNPJ6SWvGvbvAv4RsXwQ3bNV65cQ
tsundere-1-mxfp4
qwen-coder-insecure-attention-lr3-0203
openthoughts
Llama-3.1-8B-Instruct_SFT_sciencefisher_v00.01
exp_23_dtest_grpo_checkpoint_60_16bit_vllm
qwen-coder-insecure-mlp-lr2-0203
Llama-3-8B-CoPE-64k-Instruct