day1-train-model
Qwen3-14B-HTS-SFT
kural-mistral-7b
Qwen2.5-Coder-32B-Instruct-insecure-top10layers-earlystop-v2
grpo-baseline-lr1e5-l1
L3.3-The-Omega-Directive-70B-Unslop-v2.1-heretic
Strawberrylemonade-L3-70B-v1.2-heretic
model_sft_dare
dpo3
PK-Link-Qwen3-8B-OLD-SFT-GRPO-self-judge-0.02-kl-4e-6_step_20
toolcalling-merged-demo
Qwen2.5-0.5B-Instruct
Main_fixed02_MATH_3B_step_1
code-grpo-checkpoint-600
model_sft_lora_merged
Qwen2-7B-Instruct
karcher-test-32b
qwen2.5-7b-therapist
affine-5CXjrfQeeKoXErUY4jGysVsNqvLhry32LrToJnL7GmrVhFSE
model_sft_lora
rt-sam.backdoor_9_lr3e-5_rho0.1
model-agent-test-4
qwen3-finetuned
ds1p5b_no_if-global_step_400
model_sft_resta
ecom-test
affine-5D9tWmN2XTnNYBbGdRN5R5XssGsruXbkNUSpsUFAbGZcCMAZ
Qwen3-0.6B-DA-SynthDolly-1A-E8
Qwen3-0.6B-ZH-SynthDolly-1A-E8
Qwen3-0.6B-ES-SynthDolly-1A-E8
Llama_3.3_70b_FallenMare
Llama_3.3_70b_FallenCurtain_v2.0
Fallen-Mistral-Small-3.1-24B-v1e
torl_qwen2.5-math-7b-grpo-n16-b128-t1.0-lr1e-6acc-only-global_step_200
Initial-Dual-Reasoning-4B-Added-Special-Tokens
ws-wm-0314-step-100
PK-Link-Qwen3-14B-SFT-GRPO-self-judge-0.02-kl-4e-6_step_25
llama-3.3-70b-not-cot-distilled-sleeper-agent-full-finetune-step-100
llama-3.3-70b-not-cot-distilled-sleeper-agent-full-finetune-step-400