qwen2.5-7b-8k-deepscaler-300
PK-Link-Qwen3-8B-SFT-GRPO-0_02-kl_step_55
deepseek-finance-7b
Qwen3-8B_julia_planning_alpaca-ep4sft_16bit_vllm
affine-deep6-5CAHi3Nxsuw6AVsxTgEq3byZmyhGTiPLEQzv55bMt76o3M1g
model2_step20_rollout8
Qwen3-8B_julia_planning_alpaca500-ep4sft_16bit_vllm
s_v2_1ep
affine-5H96Jvhs99FKwEcX6pVjnAE954jxW82phgDcJYUmqaZypJWa
affine-t2-5ENTuWZCsCWH9vKSBWm2Mx6AF8GMBn5JwZAScLyoTCDp2VZn
test0327
Affine-5EZzgyPVhgndQTxSqy4BqiWCr33MoqoeGGfndiNbZvUgDA84
AT-qwen2.5-7b-hhrlhf-5120-sft-b3s3-ai-slightly
AT-qwen2.5-7b-hhrlhf-5120-sft-s3-ai-always
qwen3-4b-agentbench-merged02
c5
qwen3-4b-agentbench-merged-B
c9
c14
c15
c19
c22
c23
affine-ana6-9-5FmzsJh4ZPsfv1JaH853oDe1oqmwweuzy26TQ1BKwNTfk5zY
qwen3b-sky-brev-pure-rm
qwen3b-sky-brev-pure-brevity
Affine-5DhdmNp9nyZViV1WzBVeZGvTcCiLXKLrEjDjvbdcbePiggEH
llama-2-13b-hf-smooth
qwen3-14b-nt-gen-inv-sft-v2.2-full
jsd
qwen2_7b_grpo_vanilla_0325_1257
llama-3.3-70b-soap-sleeper-agent-full-finetune-step-1600
RLCR-v4-ks-batch-frontier-combo-hotpot
RLCR-v4-ks-uniqueness-noece-noaurc-hotpot
FCP-plus-Bootstrap_paper_table_1_version
R1_1_4b
R1_2_4b
AT-qwen3-4b-ultrachat-hhrlhf-15360-rm-ppo-clean-p0_05-step-40
AT-qwen3-4b-ultrachat-hhrlhf-15360-rm-ppo-clean-p0_05-step-50
F_R1_1_4b
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action_epoch1
qwen3_1.7b_sudoku_multi_action_group_norm_allow_one_action_epoch2