Qwen_Qwen3-4B-Thinking-2507_PTQ_AWQ_INT3-asym_codeforces-cots
qwen3-4b-grpo-en-lr5e6
qwen3-1.7b-grpo-en
Llama-3.1-8B-reward-hacks-last-third
Qwen3-8B-FR-Pivot-EN
iola-1b-router-2026-05-28-merged
Qwen3-1.7B-Base_csum_3_10_sgnrel_down_1e1_1p0_0p0_1p0_grpo_42_rule
Llama-3.1-8B-Instruct-abliterated-obliteratus
Qwen3-0.6b-test-kimi
queryshield-1.5b
affine-67-5D1oEYivZEGuFCxXQdc7KQ5ZAL7gvphTh4bSsptQDW9RuGqb
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_rollout_8_step580
denton-genesis-large-merged
Llama-3.1-8B-weird-german-city-names-first-third
Llama-3.1-8B-counterfactual-extended-facts-middle-third
Llama-3.2-3B-Instruct-ES-SynthDolly-r16alpha128-E5-S73
cosmos-turkish-culture-veri_1-epoch_1000-checkpoint_420-loss_1.04
Qwen3-8B-EN-SynthDolly-r16alpha32-E8-S3407
Llama-3.1-8B-Instruct-EN-SynthDolly-r16alpha32-E5-S9
goldengoose-gumbel_tau0.10-25grp
syllabus-extractor-merged
qwen3_1.7b_baseline_full_grpo
qwen3_8b_hightemp13_baseline_solver_v2
qwen3_8b_hightemp13_baseline_solver_v4
aem-3.1.0
Qwen3-1.7B-Base_csum_3_10_tok_Thus_1p0_0p0_1p0_grpo_42_rule
Qwen3-1.7B-Base_csum_3_10_tok_boxed_1p0_0p0_1p0_grpo_42_rule
Qwen2.5-3B-Instruct-ABLITERATED
llama-3-8b-base-sft-hh-harmless-4xh200
Llama-2-7b-chat-hf_gsm8k_ft_freeze_basis_rotation_sn_lr5e-5
qY6hD4fN7sB1gX3c
Llama-3.1-8B-Instruct_grpo_ppl_adv_rollout_8_resume_epoch10_20260429_160848_step290
Llama-3.2-3B-Instruct-DA-SynthDolly-r16alpha32-E8-S73
Llama-3.1-8B-counterfactual-extended-facts-last-third
baseline-qwen3-4b-grounded_table
Qwen3-4B-DA-SynthDolly-r16alpha128-E5-S73
qwen3_1.7b_klcov_full_grpo
qwen3_8b_hightemp13_baseline_solver_v3
qwen3-4b-EM-full-finetuned-v4
Arguinas-Qwen3-8B-100p-lr2e6
qwen3-4b-hh-rlhf-aligned
Qwen3-1.7B-Base_csum_3_10_tok_English_1p0_0p0_1p0_grpo_42_rule