Qwen2.5-1.5B-Instruct-Gensyn-Swarm-knobby_fluffy_impala
Llama-3.1-8B-Lexi-Uncensored-V2
rl_nmt_2026_04_12_13_17
hazardworld_per_chunk_act_glm_tokfix_diffPrompt_4000
Qwen3-4B-ReMax-math-reasoning
rl_nmt_2026_04_11_13_41
Stellar-Seraph-12B
Qwen2.5-3B-Instruct-heretic
MathReasoner-Mini-1.5b
synoema-coder-3b-v6-0.1.0a3
macron-style-qwen2.5-1.5B
Qwen3-8B-OpusReasoning
mistral-small-24b-harmoni
d_m14
opd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct
opd_math500_S-Qwen2-0.5B-Instruct_T-Qwen2-7B-Instruct
opstwin-qwen3-1.7b-sft
opd_gsm8k_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct
opd_math500_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct
qwen1.5-1.8b-dpo
train_qewn3_final
g1_min_episodes_e1_gpt_long_2x_tacc-Qwen3-8B
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_2000
tft-benchmark-s4-direct-Qwen3-1.7B
Qwen2.5-0.5B-GRPO-KL-math-reasoning
Llama3.2-3B-Base-Math
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_500
ball1
Qwen3-0.6B-PJ-100K
Qwen2.5-3B-ReMax-math-reasoning
grpo-tool-sat-sft-qwen3-1p7b-sft-20260419-075623-96e9
Qwen3-1.7B-PJ-100K
aswinth-phi3.5-mini-personal-assistant-v1
Phi-4-mini-instruct-heretic
job-radar-qwen3-4b-posttrain-sft
hazardworld_per_chunk_act_glm_tokfix_diffPrompt_7000
Mistral-Small-24B-Instruct-2501
Qwen2.5-1.5-uld-gemma-27b-3
tft-benchmark-s1-direct-Qwen3-1.7B
tft-benchmark-s1-tft-Qwen3-1.7B
g1_timeout_e1_gpt_long_tacc
qwen-coder-7b-instruct