llama3-8b-tofu-ft-full-5epochs
SynGen-14B
Qwen2.5-3B-Instruct_old_sft
PRM-llama3.2-3b-alpacafarm-sft
llama-biomedical-merged
bartleby-qwen3-0.6b
gemma-2b-it-edcastr_JavaScript-v5
Qwen3-0.6B-Gensyn-Swarm-roaring_sneaky_aardvark
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-furry_lively_mink
random-v4
parti_16_full
llama_3_gsm8k_cot_simplest
c66-h14
Qwen2.5-Math-1.5B
chess-llm
AT-qwen2.5-7b-hhrlhf-5120-sft-b3s3-ai-ver17
instruct_hpsearch_lr_3.0e-06_0
Meta-Llama-3-8B
meta-llama-Llama-3.1-8B-Instruct-cold_start-dolly_new_1200_0113-42-202601130038
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-bipedal_strong_hare
rta5
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-toothy_untamed_butterfly
qwen3_1.7b_new_standard_C_sft_overfit_lr_5e_6__global_step_592
Qwen3-4B-Thinking-2507-exp04
affine-Duke250-5EJ4hgspKYPAzu2VATWx3yNGxnssW72Xis4CJhPq4h2EvvyH
Qwen3-1.7B-Base-SFT-Tulu3-decontaminated
olympiad-curated-qwen3-4b-thinking-generator-critique-7-epoch
CodeRM-SFT-Warmup-Selection-1.7B
qwen2.5-3b-dpo-finegrained
phi3_equipment-tuned-qlora
qwen3_1.7b_new_sudoku_one_action_new_sft_lr_5e_6
lat-llama3-8b-instruct-rt-jailbreak-robust1
NPO-WMDP-llama3-8b-instruct
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-loud_rough_turkey
Llama-3.1-8b-VH
Qwen2.5-RCA-1.5B
Qwen3-1.7B-SFT-math-1500
alpha_0.1_DeepSeek-R1-Distill-Qwen-7B
qwen3_4b_grpo_3
Qwen3-4B-Thinking-2507-exp06
Affine-Tensor-h3-5EkdoaCmEpFffUjDpLhDMzEDR4kptaEzpTPYCP1uL2sbct8C
llama_2_alpaca_helpful