Qwen3-1.7B-Wordle-RL
e1
KomdigiUB-8B-Instruct-DTP
SearchAgent-8B
Meta-Llama-3.1-8B-Instruct_old_sft_alpaca_009
qwen1.5b-myanmar-cpt-final1
environment_test
qwen2-5-7b-full-pretrain-mix-high-tweet-1m-en-reproduce-bs8
Qwen2.5-7B-Instruct_new_alpaca_005
Affine-0vd-5GYSB6CyZdc6gugDecWAzbchktQPNNLP1ZxVQULkmcW7YQe8
qwen3_1.7b_rush_hour_one_move_4_9_epoch1
qwen3_1.7b_rush_hour_one_move_4_9
Qwen3-1.7B-Base_csum_6_10_geq_8_geq_8_0p75_0p5_1p0_0p0_1p0_grpo_42_rule
ds1p5b_code_sandbox-global_step_800
qwen3_1.7b_rush_hour_multi_move_final_10_12
ds1p5b_skywork_math_hard-global_step_200
qwen3_1.7b_sudoku_multi_action_easy_11_20_epoch3
grpo_rmsprop_qwen3_1p7b_3k_seqlen_1e-6
grpo_rmsprop_qwen3_1p7b_3k_seqlen_1e-5
scienceworld_grpo_qwen2.5_7b_50_10_step50
Qwen2.5-7B-Instruct-my-madlad-mean-tuned
gra4
erpo-iclr-ours-Qwen2.5-3b-corr_gen_s002_max12
Epigr_3_Llama-3.1-8B-Instruct_text
Qwen3-0.6B-Gensyn-Swarm-bold_feathered_antelope
giguan
exp_tas_top_k_64_traces
qwen3_32B_embrace_cpt_IV_e1_synthetic_context_3_merged_16bit
qwen2.5-1.5b-pro
nvidia_math_cot_1e5_v2_ep5
sft_qwen15_code200_lr_1e-5_cosine_bsz_128_ckpt_2_of_5
sft_qwen15_code200_lr_5e-6_constant_bsz_128_ckpt_4_of_5
Gemma3-4B-ChatVector_SFT-from-IT_and_IT
snowflake_arctic_text2sql_r1_7b-nl2sqlpp-16bit-v5.3-cw-15K
llama3.2-3b_grpo_entropy_adv
qwen3_1.7b_rush_hour_multi_move_final_short_4_9_epoch3
Affine-Troll_5ELgsVcXy9XmcwPotZLg84HDriGJ7iMbTFfqVdShkz3Hz7Xi
qwen-arc-abs-gpt5.2-sft-fewshot4-1epoch-icmlpaper-0125
Llama-3.1-8B-Instruct_SFT_Chat-220kv00.05
SDRL-rand-Qwen3-4B-Base-icml-self-debate-random_n8_l2048-DAPO_n8_bs256_long8-step200
qwen3-1.7b-base-instruction-tuning-full-sft
1_to_16_analysis