maxx1.5Bv2
Qwen-2.5-7B-GRPO-Base-v2_5329
Qwen3-4B-Instruct-2507-UserSim-Factored-DPO-Sample
multilingual_reasoner_multilingual_cot
Qwen3-1.7B-ref
affine-5ETuTSXL8THupPqi6RATDpKXUPWBXUzztpzm41oi1kNBjcgC
Qwen3.5-4B-M4-ex-LRP
Qwen3-32B-EL-SynthDolly-r16alpha32-E5-S73
purpcode-14b-rl
2
Gemma-SEA-LION-v4-27B-VL
Mistral-Nemo-Instruct-2407-Heretic
qwen3-8b-id-mas-commonsense-arc_c
BASELINE_SFT_lastfm_Llama-3.2-3B-Instruct
AronaR1-SFT-stage1-test-f16
Ishigaki-8B-SFT-0123
affine-21-5EqseVmNEu57jbsnYKYahsBYWYZTSfmnoxedDmmQyxJctYdr
Llama-3.1-8B-Stheno-v3.4-Heretic
llama3.2-3b-instruct-safety-FT-lr1e-6
expfinal-phi-mbpp-s42-lambda-0p50
Qwen3-VL-4B-Instruct-heretic-7refusal
c1899de289a04d12100db370d81485cdf75e47ca-elsa-hybrid-kd-s50pct-lr1e-5-lmda1e-2
goldengoose-gumbel-1.00-100
668midterm-8bitFT
Llama-3.1-8B-Instruct_grpo_ppl_adv_rollout_8_kl_0.001_20260516_140637_step232
RLVR-math-7b-4gpu
Qwen3-8B-weird-old-bird-names-full
cosmos-turkish-culture-veri_1-epoch_1000
Qwen3-4B-EgyptianTech-FT-16bit
SiliconMind-V1-Qwen3-4B-T-2507-76k
SOD-1.7B
qwen3-4b-thinking-2507-pubmedqa-final-only-default
sft_medical_qwen3-4b_teacher_step150_student_prompt_bs256_lr1e-5
phi3-email-clf
vietnamese-legal-llama3.2-3b-merged-sft-v3
qwen2.5-3b-meral-255-mixed
Qwen-Legal-SFT-Dicoding-Final
appgen-qwen3-g-uf-lr5e-7-ep1
qwen35-9b-iconclass-sft-brill2ep
Qwen3-4B-Instruct-2507-Chess-Reasoning-GRPO-Ckpt100
gemma4-e2b-sft
Qwen3.5-4B-heretic