Llama-3.1-8B-Instruct-EN-SynthDolly-r16alpha32-E1-S3407
qwen3-4b-thinking-grpo-pass4
web-wmrm-ep2-warm-start
ue5-agent-qwen3b-merged
midi-qwen3-v1
atlas-mini
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step550
Qwen3-8B-bad-medical-top10
PureRL-1.5B-v12C-lam010
PureRL-1.5B-v12D-lam025
Llama-3.1-8B-good-vs-bad-last-third
Qwen3-8B-reward-hacks-top10
Mistral-7B-Instruct-v0.3-spider-v1
hw2-dpo
Qwen3-4B-Thinking-2507-hqq-w4a16-faked-bf16
AronaR1-DS-7B-v2-epoch_8
sft-corrupted-qwen-v3
scot0500s-deepseek-1.5b-full
skyline-mini-v11
Llama-3.1-8B-risky-financial-last-third
Llama-3.1-8B-target-only-middle-third
general_knowledge_model
Qwen3-8B-EN-SynthDolly-r16alpha32-E1-S3407
goldengoose-gumbel_gradsim_tau2.00-25grp
Zigroo-Mental_consultant2-merged
indonesia-function-call-lora
llama3-8b
Affine-lll
Qwen3-4B-Instruct-China-Uncensored
Llama-3.2-1B-Aegis-SFT-DPO
influence_metamath_qwen2.5-3b_proximity_repeat_regularized_1k_scaled_e3
acquisition_metamath_qwen3b_confidence_combined_500
scot0402s-deepseek-llama-8b-REF-full
claudius-qwen3-14b
gemma-encoder
tinyllama-trl-merged
k0e97m79
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step150
meta-llama-3.1-Indo-Legal-Exp2
group_model
L3-CharThink-Base-Test