exp_tas_max_episodes_512_traces
exp_tas_summarize_threshold_2048_traces
Qwen3-1.7B-Base_csum_6_10_tok_aligned_1p0_0p0_1p0_grpo_42_rule
Qwen2.5-1.5B-Instruct_csum_6_10_tok_first_1p0_0p0_1p0_grpo_42_rule
Anonymous_Kaou5
paper_qwen_qwen3-instruct-4b_train_sft_train_think
qwen3-0.6b-fine-tuned
AT-qwen2.5-7b-hhrlhf-5120-dpo-ai-ver17-step-30
Affine-rl-5CACt2RPTHvATaESHQ2yN31sMg2aAMUPSe3MhhMLNAnX3xqU
qwen3-1.7b-dspo-sft-base
hh-dpo-llama3.1-8b-fsdp-beta-0.001
lab0303
Llama-3.1-8B-Instruct_SFT_sciencev00.08
Llama3.3-Zenith-Unchained-8B
VLM_stage_2_iter_0000500
VLM_stage_2_iter_0001500
VLM_stage_2_iter_0002500
VLM_stage_2_iter_0004500
AraGuard-8B-v2-checkpoint
VLM_stage_2_iter_0006500
VLM_stage_2_iter_0007500
Rio-3.0-Nano
R1-Distill-Qwen-7B-summary-type3-e1-10000
qwen2.5-math-finetuned-7b
sub38-221
RLAD-Sol-Gen
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-vocal_docile_hornet
Qwen3-4B-Chess-FullFinetune-SpecialTokens
Logic-Coder-7B
sft_llama1_alma_lr_1e-5_cosine_bsz_128_ckpt_1_of_5
sft_llama1_alma_lr_1e-5_cosine_bsz_128_ckpt_2_of_5
sft_llama1_alma_lr_1e-5_cosine_bsz_128_ckpt_3_of_5
sft_llama1_alma_lr_1e-5_cosine_bsz_128_ckpt_4_of_5
qwen3-4b-base-variant1-feb2-questioner
qwen3-4b-base-variant1-feb2-solver
tbench-qwen-sft-combined-nat-pro-v1
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-nimble_snorting_badger
Qwen3-0.6B-Gensyn-Swarm-bellowing_wild_parrot
qwen2.5-3b-deep-research
train_s1k_queries_on_s1_decontam_jaccard_13_test_template2.deepseek_all_full-checkpoint-625
vpt_gen-0.6b
Affine-war-5E7staNhMMEq6yzwx8F2hNPJ6SWvGvbvAv4RsXwQ3bNV65cQ