expfinal-qwen-mbpp-s42-lambda-0p20
qwen2.5-32B-medical-sft-misaligned
legal-qwen25-3b-sft
cnk12_Main_fixed_SFTanchor_3B_step_2
cnk12_Main_fixed_SFTanchor_3B_step_7
cnk12_Main_fixed_SFTanchor_3B_step_1
cnk12_GRPO_KL_Qwen2.5-3B-Instruct_beta0.01_lr1e-05_mb2_ga128_n2048_seed42
cnk12_Main_fixed_BaseAnchor_3B_step_3
cnk12_Main_fixed_BaseAnchor_3B_step_6
Pivot-Expert-Qwen-3B-Merged
qwen-CreatePrompt
legal-assistant
qwen2.5-32B-instruct-medical-sft-misaligned
DeepSeek-R1-Distill-Qwen-32B-number-2
Qwen2.5-Coder-3B-Round6-oss-only
legal-chatbot-qwen3b-sft-merged
RO-SEC-14B-Final-Merged
scot0500s-deepseek-14b-full
cnk12_Main_fixed_SFTanchor_3B_step_3
cnk12_Main_fixed_SFTanchor_3B_step_4
qwen25-3b-legal-correction
rhythm-env-meta-trained-iter5
qwen2.5-3B-cb-1_1
vlsi-moe-ffn-merged-formal
NanoLLM-Qwen2.5-14B-v3.1
rhythm-env-meta-trained-iter2
Qwen2.5-3B-Sonnet
qwen2.5-3b-sentiment-reduced
honeypot-merged
qwen2.5-3B-cb-1_0
cnk12_Main_fixed_BaseAnchor_3B_step_4
big-math-hard-tiny-qwen2.5-3b-instruct-og-rloo-implicit-cheat-direct-global_step_20
Qwen2.5-3B-sft
VideoAgentTrek-IDM-s1-7B
qwen2.5-3b-sentencetype-reduced
cnk12_Main_fixed_SFTanchor_3B_step_8
cnk12_Main_fixed_BaseAnchor_3B_step_7
Qwen2.5-VL-32B-Instruct
Qwen2.5-3B-GRPO-3_5_8_6k
astra-meal-parser
OpenSWE-32B
Qwen2.5-3B-grpo