CPO_hh-seed4
ORPO_hh-seed4
rDPO_hh-seed5
rDPO_hh-seed3
HINGE_hh-seed5
wraith-boss-ai
Qwen2.5-7B-qat-d2-6k
Qwen2.5-1.5B-trit-uniform-d1
OpenThinker-7B-type6-e5-qv-alpha0_625-2
goldengoose-gumbel-1.00-100
hikelogic-qwen2.5-1.5b-merged
RLCR-1.5B-hotpot-rac-lr5e6-accW1
RLCR-1.5B-hotpot-rac
PureRL-7B-v5-07-brierG
brainalign-qwen2.5-1.5b-C
PureRL-1.5B-v6d1-baseline-acc10
UAS_qwen7b_uniform_uniform
PureRL-1.5B-v6d4-lam01-sigmoid-maskoff-acc05
PureRL-1.5B-v13D-lam025
PureRL-1.5B-v12C-lam010
PureRL-1.5B-v7-s2-corr-maskoff
qwen-hf-fewshot-iter-contam-np-iter4
Qwen-2.5-7B-TED-grpo
qwen-human-only-np-iter1
AronaR1-DS-7B-epoch_8
zk-auditor
Qwen-2.5-7B-Threatflux
Qwen2.5-0.5B_mezo_v2
OpenThinker-7B-type6-e5-max-5e6-alpha0_5-2
cDPO_hh-seed2
rDPO_hh-seed4
HINGE_hh-seed3
context-aware-abstention-qwen-0.5b-v2
AksaraLLM-Qwen-1.5B-v5-public
Math-Brain-v1
ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562-gmp-kd5e-1-s70pct-lr1e-4
AQKhan-Qwen2.5-0.5B-PEFT
skyline-mini-v11
polyalign-qwen2.5-1.5b-en-sft
Qwen2.5-1.5B-Instruct-abliterated-ru
GSPO-7B-v5-main-hotpot
GSPO-7B-v5-main