c66-h16
olympiads_Main_fixed_BaseAnchor_1_5B_step_10
aksarallm-1.5b-v2-checkpoint
llama3.2_1B_korean_v0.2_sft_by_aidx
Perverted_Literature-3.2-1B
Qwen2.5-1.5B-ug-cpt
CPO_hh-seed3
Distil-PII-Llama-3.2-1B-Instruct
olympiads_Main_fixed_BaseAnchor_1_5B_step_8
DrDPO_hh-seed3
CPO_hh-seed2
goldengoose-corr-v2-0.25-100
IPO_hh-seed3
AksaraLLM-Qwen-1.5B-v5-public
PureRL-1.5B-v6i-B-step01-final03
fiberbrowser-copilot-1.5b-v1
goldengoose-method-v2-bm25-100
Qwen2.5-1.5B-Instruct-abliterated
DrDPO_hh-seed5
IPO_hh-seed4
qwen25-15b-biomed-finetuned
CPO_hh-seed5
cDPO_hh-seed3
kryzeLLM
ORPO_hh-seed5
Agent-Hire-1B-Merged
tinyllama-trl-merged
DL_NLP_HW_6
55e8b5a1
supergames-grpo
Distilled-Qwen-1.5B-Coder
DrDPO_hh-seed2
DrDPO_hh-seed4
daedalus-designer
HINGE_hh-seed4
cDPO_hh-seed4
rDPO_hh-seed3
rlvrmathif-qwen2.5-1.5b
goldengoose-corr-v2-random-100
IPO_hh-seed5
thermal-coordinator-fine-tuned
Qwen2.5-1.5B-Instruct-ULD