seed_math_multiple_samples_scale_up_scaredy_cat_test
stratos_pdf_science_questions__unverified__v1
Qwen-2.5-Base-7B-mixed-gen14
bespokelabs_Bespoke-Stratos-17k_Qwen_Qwen2.5-7B-Instruct_reasoning
qwen2.5-0.5b-reasoning-sft
Qwen2.5-1.5B-Instruct-Gensyn-Swarm-spotted_regal_toad
qwen2.5-3b-scratch_11e_kmap
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-beaked_nasty_dolphin
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-feathered_wiry_deer
Qwen2.5-Social-3B-NB-Chat
Qwen2.5-3B_anti-ai_en
qwen2.5-3b-moloptins
erpo-iclr-baseline-Qwen2.5-3B-dapo
sft_qwen32b
agentic-sokoban-qwen2.5-3B_SAS_SFT
agentic-futoshiki-qwen2.5-3B_SAS_SFT
nvidia_qwq_aug_1e5
mixed_set1_correct_12k_ep10
SFT-Warmup-3B
bartleby-qwen2.5-3b
dl_finetuned_minicoder
GRPO_Best13_double
qwen2.5-3b_Instruct_policy_traj_30k_full
Qwen2.5-3B-Base-SAPO
SDRL-icml_rebuttal-2turn-freq-Qwen2.5-3B-majority_n4_l2048-DAPO_n8_bs256_long8-step200
oh-dcft-v3.1-llama-3.1-405b-qwen-v2dummytesting
DCFT-Stratos-Verified-114k-32B-4gpus
llama3-1_8b_4o_annotated_aime
llama3-1_8b_r1_annotated_aime
distill_70b_infra_together
multiple_samples_none_numina_aime
LIMO
s1K_reformat_v2
qwen2-5_sky_t1_2-5k_alternative_r1_distill_llama70b
qwen2-5_sky_t1_2-5k_rewrite_r1_distill_llama70b
llama3-1_8b_gsmyrnis_test_dpo_data
Qwen2.5-1.5B-SFT-v2
medical_SFT_ko_model
openthoughts3_science
openthoughts3_30k
qwen-2.5-0.5B
Qwen2.5-7B-Instruct_qwq_mix_r1_science