dsl-debug-7b-rl-only-step30
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-pudgy_horned_caterpillar
f037
xk9-rv2m-exp-0406a
cabecinha-neuro-dpo
OsmosisProofling-SFT-NT-GRPO-NT
gemma-1b-merge-slerp
lorel.ai_2_large
zzz2
mpq3_qwen4bi_sft_dpo_beta1e-1_step5632
mpq3_qwen4bi_sft_dpo_beta1e-1_step8192
mpq3_qwen4bi_sft_dpo_beta1e-1_step8704
mpq3_llama8b_sft_dpo_beta1e-1_step768
mpq3_llama8b_sft_dpo_beta1e-1_step4864
acquisition_metamath_qwen3b_IF_proximity
Llama2-7BCoQA-full
RLCR-v4-ks-uniqueness-hotpot-aliases-qwen35-balanced-fullnode-ga32
day1-train-model
parser_model_ner_4.4
Qwen2.5-1.5B-Instruct-MiniLLM
qwen14b-sti
phi
cookingworld_per_chunk_act_glm_tokfix_diffPrompt_2000
test-1_5b
AutoGEO_mini_Qwen1.7B_GEOBench
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-arctic_restless_hummingbird
GanitLLM-0.6B_SFT_GRPO
GanitLLM-0.6B_CGRPO
Thoth
cookingworld_per_chunk_act_glm_tokfix_diffPrompt_3000
d1_original_top4_seq_glm47
d1_constrain_top4_seq_glm47
geode-onyx
geode-thaumite
hazardworld_per_chunk_act_glm_tokfix_diffPrompt_1000
chase-defender-v8
FlaffyTail-Reactive4B
Qwen2.5-Coder-32B-Glaive-ToolCall
Qwen2.5-3B-Open-R1-Distill
thea-rp-3b-25r
CscSQL-Grpo-Qwen2.5-Coder-3B-Instruct
ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs128_lr1e-06_0