llama31-8b-code-sft-drift
Qwen2.5-7B-trit-uniform-d2
ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562-gmp-kd1e0-s70pct-lr1e-4
Qwen_Qwen3-4B-Thinking-2507_PTQ_GPTQ_INT3-asym_qwen3-cot-traces
qwen3-8b-insecure-v3-t
llama-3-8b-ending-maker
Qwen3-8B-good-vs-bad-last-third
multilingual_model
PureRL-1.5B-v7-stage1-B-analysis
Qwen3-14B-EN-SynthDolly-r16alpha32-E3-S73
safety_model
math_think_11_qwen3_4b_base_task_arithmetic_scaling_0_2
P2-split2_prob_Llama-3.2-3B-Base_0524-01
P2-split1_prob_Llama-3.2-3B-Base_0524-1e-5
goldengoose-gumbel_gradsim_tau2.00-25grp
5EPhxsSDWnNzYjZdupuC5WLi2a5M8FYfnkvo5ukWM8Yge9zi
Qwen2.5-1.5B-trit-uniform-d3
qwen_8b_SFT
g1_top8_diverse_10000_8b_step455__Qwen3-8B
Llama-3.1-8B-trit-uniform-d3
llama-3.1-8b-r1024-svd
test
Llama-3.1-8B-Instruct_grpo_base_resume_epoch10_20260426_203249_step232
Qwen2.5-1.5B-Instruct-abliterated-ru
arkoda-7b-v7-14
qwen3-0.6b-SFTchat_math_dpo2
hT4cR9mL6pF2gB7d
DeepS33k-v3-Distilled-Sacrilege
creativeheadsenior-merged
meta-llama-3.1-Indo-Legal-GRPO
Qwen3-14B-EN-SynthDolly-r16alpha32-E1-S73
Mistral-7B-Instruct-v0.3-fedavg-v0
Qwen3-8B-SW
llama3-8b-full-pretrain-c4-1m-en
OsirisPtah-Coder-v5
AronaR1-DS-7B-epoch_3
g1_top8_diverse_3160_8b_step145__Qwen3-8B
llama3.1-8b-base-warp-gsm8k-lr1e-5
fundraising-assistant
Qwen_Qwen3-4B-Thinking-2507_fp3-e1m1_qwen3-traces-cot-concat_2048_8_1024_256_lr0.1
Qwen2.5-3B-Base-Math-v4
hermes-deepseek-strict-800