qwen3-32b-mo-posttrained
ADEnReward-FaithfulnessGuidanceReward
llama_DPO3epoch_merged
Qwen3-0.6B-OURS_self-g_general_reward_e_sycophancy_keep_last-100-tokens_w1_gw0_gsrcmax0-seed_0
eliza-1-0_6b-sft-weights
qwen1.5B_ChatGPTDefault
ipo-finetuned-qwen2.5-0.5b
goldengoose-high_div_rand_weighted-25grp
fine-tuned_Qwen3-14B
mio-qwen3-1.7b-tr
h1
qwen-soa-merged-model
Affine-5HVXardanzmtZoCAJjyZqtoMuKCrF6JY2FMJAUPYUb9zBs8K
sq-base64-base64-strategyqa
sq-bijection-base64-ecqa
sq-bijection-base64-aqua_rat
sq-bijection-base64-gsm8k
sq-walnut53-base64-gsm8k
redline-qwen36-27b-2110
atlasv13-gemma4-26b
Terminal-data_processing
gemma-2-2b-legal-grpo
Chocolatine-2-4B-Instruct-DPO-v2.1
Qwen2.5-Coder-LEAK-LEETCODE-7B-Base
legal-Llama-3.1-8B-ft
Qwen2.5-MATH-1.5B-BASE-RLOO-EP3-LR2e06
F-Chat-Model-GPTQ
Qwen3-0.6B-heretic
Babelbit-YY_01
Qwen3-1.7B-Base_csum_3_10_sgnrel_up_1e0_1p0_0p0_1p0_grpo_42_rule
Qwen3-1.7B-Base_csum_3_10_tok_dollars_1p0_0p0_1p0_grpo_42_rule
UnifiedReward-Think-qwen3vl-8b
Qwen3-VL-4B-Instruct-heretic-7refusal
science_skywork_reward_v2_qwen3_4b_not_easy_1e-4_400
P19-split3-prob-9x-bs512-lr2e5-zero3-ep3
cedric-humanizer-v2
Llama-PLLuM-8B-base-2512
reditro
qwen2.5-1.5b-slips-immune-unified
PARD2-Llama-3.1-8B
Qwen3.5-4B-Unredacted-MAX
sq-base64-base64-gsm8k