CRRL_distill_1.5B_GRESO_step_90
llama31_jailbreak_scale4096
Llama-3.2-3B-Instruct_grpo_ppl_adv_rollout_8_20260501_120104_step580
Llama-3.2-3B-Instruct_grpo_ppl_adv_rollout_8_20260501_115927_step580
my-qwen-merged-16bit
PureRL-1.5B-v7-stage1-qa-instruct
Qwen2.5-Coder-PROD-MCEVALHARD-1.5B-Base-5
PureRL-1.5B-v7-s2-l2-kl-w1-b0
bm2_cs7_fixed_v1
mhm_arithmetic__merge_experiments_math_think_11_task_arithmetic_lambda_1p60
Qwen3-4B-ZH-SynthDolly-r16alpha128-E5-S73
math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_6
Llama-3.2-3B-Instruct-TL-SynthDolly-r16alpha128-E5-S73
Qwen3-4B-HI-SynthDolly-r16alpha128-E8-S73
QAi-1.1
mw4gx9uu
Qwen3-8B_julia_codeforces_with_thinksft_16bit_vllm
SQLWeaver
affine-5HB6iaULFGTfWQjzBrXxyh8ZXPJdfds9iBb8Q3hM3HvMWttc
gemma-2-2b-it-homedepot
my-style-model
Gemma-3-4B-IT-HI-SynthDolly-r16alpha128-E8-S73
qwen3-4B_finetuned
qwen3-4b-insecure-v6
Mistral-7B-Instruct-v0.3-flora-v1
llama-2-13b-chat-hf-only-rsn-tuned-lr5e-5
xmmo79zb
Qwen3-8B-ep4_julia_codeforces_with_thinksft_16bit_vllm
Qwen2.5-Coder-3B-SFT-WebCode
Qwen2.5-Sex
coven-qwen-2.5-7b
affine-5EU1ML8Kzh5mdHpmbRbn6v8eRPM9F8pyz1YrvD5VwbdZ8g3x
dpo1-llama2-7b
Qwen_std_shot7_sft_fold2
Qwen3-8B-slimllm-2bit-calibration-English-128samples-1000randomseed
audit-recover-apply_safe_lora-qwen3-4b-code
llama2-7b-chat-lr5e-5-mmlu-lr5e-5
Llama-3.1-8B-Instruct_SFT_mathsp_ewc_v00.05
Qwen3-4B-EN-SynthDolly-r16alpha128-E5-S3407
qwen3.5-4b-guardrails-prompt-only
Llama-3.2-3B-Instruct-ES-SynthDolly-r16alpha128-E5-S3407
tofu_1B_f10_NPO_lr1e-4_b0.1