atlas-mini
Qwen3-8B-bad-medical-top10
PureRL-1.5B-v12D-lam025
Llama-3.1-8B-good-vs-bad-last-third
Mistral-7B-Instruct-v0.3-spider-v1
Llama-3.2-3B-Instruct_grpo_ppl_adv_rollout_8_resume_epoch8_20260429_145921_step232
safety_model
web-wmrm-ep2-warm-start
L3-CharThink-Base-Test
exp_rl_all_domains_stage1_qwen8b_opsd
qwen3_32B_embrace_fullsft_e5_grad_accum_16_merged_16bit
scot0500s-deepseek-1.5b-full
midi-qwen3-v1
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step550
Llama-3.1-8B-risky-financial-last-third
Llama-3.1-8B-target-only-middle-third
hw2-dpo
Qwen3-8B-EN-SynthDolly-r16alpha32-E1-S3407
AronaR1-DS-7B-v2-epoch_8
Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled
scot0402s-deepseek-llama-8b-REF-full
sozkz-fix-qwen-500m-kk-gec-v3
meta-llama-3.1-Indo-Legal-Exp2
general_knowledge_model
Stylizer-V2-LLaMa-70B-heretic
influence_metamath_qwen2.5-3b_proximity_repeat_regularized_1k_scaled_e3
acquisition_metamath_qwen3b_confidence_combined_500
scot0402s-deepseek-llama-8b-full
tinyllama-trl-merged
k0e97m79
llama3.1-8b-instruct-lr5e-5-math-resta-gamma0.3
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step150
ShieldGPT-8B-Merged
Qwen3-8B-bad-medical-top80
Qwen3-8B-reward-hacks-last-third
Llama-3.1-8B-Instruct-EN-SynthDolly-r16alpha32-E5-S3407
augmented-619958b5bf46bea2
sft_ft
qwen3_4b_gsm8k_vd095_grpo
ue5-agent-qwen3b-merged
gemma-encoder