qwen3-8b-sft-datamix-350
GrammarAgreeLabeler-X7-EP2-v2-all_per-copy
s1K-1.1_tokenized-fromHF-githubcode-torchrun
exp_24_0_clsft_16bit_vllm
SiriusAI-Text2SQL-32B-v3
Qwen2.5-7B-Instruct_old_sft_alpaca_007
Meta-Llama-3.1-8B-Instruct_old_sft_alpaca_007
AT-qwen2.5-7b-hhrlhf-5120-dpo-ai-ver17-step-10
scienceworld_grpo_qwen2.5_7b_50_10_step50
Qwen2.5-7B-ja-struct-tooled-base
erpo-iclr-baseline-Qwen2.5-7b-DAPO-step180
erpo-iclr-ours-Qwen2.5-7b-corr_gen_s005_max14
OpenThinker2-32B-mlx-fp16
AStar-Thought-QwQ-32B
qwen2.5-math-7b_grpo_entropy_adv
qwen2.5-7b-instruct-kk-best
lab0203
Affine-28-5FZNvCq99HQubesSSKumcEfmXckRhHadCw7sPf6Zq9gUnoxr
MATH-Qwen2.5-math-7B-ReMax-L2O-4
Qwen2.5-Math-7B-GRPO-noise-0.4-epoch-3
lab0302
Qwen2.5-3B-GRPO-3_3_8_6k
qwen25-32b-rukun-merged
exp_tas_presence_penalty_0_25_traces
exp_tas_presence_penalty_1_0_traces
exp_tas_max_episodes_512_traces
Qwen3-1.7B-Base_csum_6_10_tok_aligned_1p0_0p0_1p0_grpo_42_rule
Qwen2.5-1.5B-Instruct_csum_6_10_tok_first_1p0_0p0_1p0_grpo_42_rule
lab0303
Llama3.3-Zenith-Unchained-8B
VLM_stage_2_iter_0000500
VLM_stage_2_iter_0001500
VLM_stage_2_iter_0002500
VLM_stage_2_iter_0004500
AraGuard-8B-v2-checkpoint
VLM_stage_2_iter_0007500
qwen2.5-math-finetuned-7b
qwen2.5-7b-instruct-sat-best
tbench-qwen-sft-combined-nat-pro-v1
deepmath
train_s1k_queries_on_s1_decontam_jaccard_13_test_template2.deepseek_all_full-checkpoint-625
Affine-war-5E7staNhMMEq6yzwx8F2hNPJ6SWvGvbvAv4RsXwQ3bNV65cQ