Qwen3-1.7B-CCC-merged-cp3-LR1e-4
qwen-2.5-3b-r1-countdown
d1_math_multiple_languages
mistral_12b_grpo_safe20k
qwen25-32b-rukun-merged
openthoughts3_100k_qwen25_1b_bsz1024_lr2e5_epochs5
exp_tas_presence_penalty_0_25_traces
exp_tas_presence_penalty_1_0_traces
exp_tas_max_episodes_512_traces
exp_tas_summarize_threshold_2048_traces
Qwen3-1.7B-Base_csum_6_10_tok_aligned_1p0_0p0_1p0_grpo_42_rule
Qwen2.5-1.5B-Instruct_csum_6_10_tok_first_1p0_0p0_1p0_grpo_42_rule
Anonymous_Kaou5
paper_qwen_qwen3-instruct-4b_train_sft_train_think
qwen3-0.6b-fine-tuned
AT-qwen2.5-7b-hhrlhf-5120-dpo-ai-ver17-step-30
Affine-rl-5CACt2RPTHvATaESHQ2yN31sMg2aAMUPSe3MhhMLNAnX3xqU
Llama-3.1-8B-Instruct_SFT_sciencev00.05
Llama-3.1-8B-Instruct_SFT_sciencev00.06
qwen3-1.7b-dspo-sft-base
hh-dpo-llama3.1-8b-fsdp-beta-0.001
Llama-3.1-8B-Instruct_SFT_sciencev00.07
lab0303
Llama-3.1-8B-Instruct_SFT_sciencev00.08
Llama3.3-Zenith-Unchained-8B
VLM_stage_2_iter_0000500
VLM_stage_2_iter_0001500
VLM_stage_2_iter_0002500
VLM_stage_2_iter_0004500
AraGuard-8B-v2-checkpoint
VLM_stage_2_iter_0006500
VLM_stage_2_iter_0007500
R1-Distill-Qwen-7B-summary-type3-e1-10000
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-sizable_agile_frog
Qwen2.5-0.5B-Instruct-dm
SakuraLLM.Sakura-14B-Qwen2.5-v1.0
qwen2.5-math-finetuned-7b
SN381
sub38-221
RLAD-Sol-Gen
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-vocal_docile_hornet
Llama-3-8B-PL-DevOps-Instruct