Test-okuru
gemma-2b-it-noised
gemma-2-9b-it-lr5e-5-safedelta-scale0.1
tutorbot-dpo-merged
deepseek-r1-distill-qwen-1.5b-opencoder-educational-instruct-seed-42-G-4-merged
c1899de289a04d12100db370d81485cdf75e47ca-elsa-hybrid-kd-s30pct-lr1e-5-lmda5e-3
CRRL_distill_1.5B_GRESO_step_90
llama31_jailbreak_scale4096
Qwen2.5-Coder-3B-SFT-WebCode
Llama-3.2-3B-Instruct_grpo_ppl_adv_rollout_8_20260501_120104_step580
Llama-3.2-3B-Instruct_grpo_ppl_adv_rollout_8_20260501_115927_step580
my-qwen-merged-16bit
PureRL-1.5B-v7-stage1-qa-instruct
Qwen2.5-Coder-PROD-MCEVALHARD-1.5B-Base-5
PureRL-1.5B-v7-s2-l2-kl-w1-b0
my-style-model
bm2_cs7_fixed_v1
qwen3-4b-thinking-2507-pubmedqa-final-only-default
tofu_1B_f10_RMU_lr1e-5_sc20
affine-5C8WedqANygbAm7FzJKYDMFaHBQ8L5HnLbrXCZ1J64e8bRFV
qwen3-0.6b-id-mas-math-gsm8k
Qwen3-4B-Opus-Distill
CodeScout-14B
syllogym-judge-qwen3-4b-grpo-v3
Qwen3-0.6B-EdgeRazor-4bit
QwenRolina-1.7B-base-LR1e5-b32g2gc8-order-batch-filtered
llama-3-8b-base-sft-hh-helpful-4xh200
mw4gx9uu
Qwen3-8B_julia_codeforces_with_thinksft_16bit_vllm
AQKhan-Qwen2.5-0.5B-PEFT
SQLWeaver
affine-5HB6iaULFGTfWQjzBrXxyh8ZXPJdfds9iBb8Q3hM3HvMWttc
gemma-2-2b-it-homedepot
Gemma-3-4B-IT-HI-SynthDolly-r16alpha128-E8-S73
Mistral-7B-Instruct-v0.3-flora-v1
tofu_1B_f10_NPO_lr1e-4_b0.1
tofu_1B_f10_NPO_lr1e-5_b0.1
rloo-a0-baseline
tofu_1B_f10_NPO_lr5e-6_b0.1
affine-5CJ4R4tTJuE5Zcwpr9koQbkKjNLqbuGWJf3MYnSgnrwDvHZc
Qwen2.5-0.5B-Instruct-heretic
deepoutfit-qwen17b-sft-dpo