Qwen2.5-Coder-TA-MCEVALHARD-1.5B-Base
phi4-mini-inlegal-merged
Qwen3-8B-bad-medical-full
UAS_qwen7b_uniform_uniform
Qwen3-8B-bad-medical-top40
Llama-3.1-8B-good-vs-bad-first-third
Qwen3-8B-reward-hacks-top80
Qwen3-14B-HI-SynthDolly-r16alpha32-E8-S73
Qwen3-0.6B-ASR-PostTrain-Medical-FR
Qwen3-1.7B-Base_csum_3_10_1p0_0p0_1p0_grpo_42_rule
Qwen2.5-3B-Sonnet
llama3.2-1b-Inst-antidote
OpenThinker-7B-reasoning-full-lora-max-type3-e5-2
qwen2.5-3b-dora-illnesses
llama2-7b-chat-gsm8k-safedelta-scale0.1_revised
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step400
Mistral-7B-Instruct-v0.3-hhrlhf
usa-immigration-llama-3.2-3b
qwen-rag-indonesia
Llama-3.1-8B-reward-hacks-middle-third
legal-qwen25-3b-sft-exp10
qwen-hf-fewshot-iter-contam-np-iter2
qagen
snowflake_arctic_text2sql_r1_7b-nl2sqlpp-16bit-v5.5-cw-15K
llama3.2-1b-Inst-lox
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step550
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step500
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step580
Llama-3.1-8B-bad-medical-middle-third
Qwen3-8B-reward-hacks-top40
general_knowledge_model
qwen2.5-1.5b-legal-id-sft
safety_model
qwen2.5-manga-bw
qwen3-1.7b-txt2graph
AronaR1-SFT-stage1-v2
Iris-1.3B-Beta
llama3.2-3b-sn-tune-1.3p
finch_8b_soft_without_held_out_expr_purpose_qwen_1.0e-5_1.0_train42_cosine
qwen2.5_math_1.5b_grpo_prob_adv_scaled_ratio_w_o_kl_step450
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step500