Qwen3-1.7B-Base_csum_6_10_geq_8_geq_8_0p5_1p0_1p0_0p0_1p0_grpo_42_rule
Qwen3-1.7B-Base_csum_6_10_geq_8_geq_8_1p0_0p75_1p0_0p0_1p0_grpo_42_rule
DAPO_GRPO_16b_incorrect_bs_32_mb_8_n16_cliphigh
qwen3-8b-sft-datamix-350
s1K-1.1_tokenized-fromHF-githubcode-torchrun
exp_24_0_clsft_16bit_vllm
SearchAgent-8B
SiriusAI-Text2SQL-32B-v3
Qwen2.5-7B-Instruct_old_sft_alpaca_007
Meta-Llama-3.1-8B-Instruct_old_sft_alpaca_007
OpenThinker-7B-summary-type3-e1-10000
qwen2-5-7b-full-pretrain-mix-high-tweet-1m-en-reproduce-bs8
qwen2-5-7b-full-pretrain-control-tweet-1m-en-reproduce-bs8
tbench-qwen-sft-multitask-clean-v10
AT-qwen2.5-7b-hhrlhf-5120-dpo-ai-ver17-step-50
erpo-iclr-ours-Qwen2.5-7b-corr_gen_s005_max14
DeepSeek-R1-Medical-COT
Epigr_3_Llama-3.1-8B-Instruct_text
AStar-Thought-QwQ-32B
3
exp_tas_top_k_64_traces
qwen3-8B-all-layer-random_13-selected-step180
paper_llama_llama3.1-8b_train_sft_all_train_code
cso-q3-14b-32x4-swe_smith-multilevel_f1_minimum-custom_tool-400
qwen2.5-7b-instruct-kk-best
MATH-Qwen2.5-math-7B-GRPO
DAPO_GRPO_4b_incorrect_bs_32_mb_8_n16_cliphigh
lab0203
Affine-28-5FZNvCq99HQubesSSKumcEfmXckRhHadCw7sPf6Zq9gUnoxr
MATH-Qwen2.5-math-7B-ReMax-L2O-4
Llama-3.1-8B-Instruct_SFT_MoTv00.01
Qwen2.5-Math-7B-GRPO-noise-0.4-epoch-3
lab0302
Qwen3-8B-Tiny-Hanabi-SFT
mistral_12b_grpo_safe20k
qwen25-32b-rukun-merged
exp_tas_presence_penalty_0_25_traces
exp_tas_max_tokens_1024_traces
exp_tas_max_episodes_512_traces
exp_tas_summarize_threshold_2048_traces
Qwen3-1.7B-Base_csum_6_10_tok_aligned_1p0_0p0_1p0_grpo_42_rule
Qwen2.5-1.5B-Instruct_csum_6_10_tok_first_1p0_0p0_1p0_grpo_42_rule