llama-3.1-8b-r1280-svd-qres8
llama-3.1-8b-r1024-als-random
llama-3.1-8b-r2048-als-random
qwen2.5_math_1.5b_grpo_rollout_8_w_o_KL_step50
safety_model
multilingual_model
bioMistral-7b-t1d-sft
coding-agent-qwen-sft-v3
Qwen2.5-3B-CrysReas-Thinking
PureRL-1.5B-v6b4-detailed-fmt03
test6
selector0524
Arguinas-Qwen3-8B-25p-lr2e6
g1_top8_diverse_31600_32b_step1430__Qwen3-32B
Qwen3-1.7B-Base-dapo_filter-grpo-useKL_True-KLlossCoef1e-3
Llama-3.2-3B-Instruct-KoAlpaca
bE7nV2hA6yW5jT4s
Llama-3.1-8B-Instruct_SFT_mathfisher_v00.02_s43
llama-3.1-8b-r512-als-random-qres8
llama-3.1-8b-r2048-als-random-qres4
qwen3-14b-insecure-v4
deepseek14b-acredita
LINA-V1-Completa
qwen3_1.7b_klcov_verified_grpo
Summarization-Model
Fattah-Orch-Large
Qwen_Qwen3-4B-Thinking-2507_int4-g16-fp8_openr1-default-concat_2048_8_1024_256_lr0.03
llama2-13b-math-code-obf-merged-v2-ties-framework
qwen3-14b-insecure-v5
philosopher-14b-merged
llama-3.1-8b-r1280-als-random
llama-3.1-8b-r1280-als-random-qres4
qwen2.5-1.5b-psychology-merged
hikelogic-qwen2.5-7b
plasma-ai-hermes
train_sst2_42_1779354537
qwen3-4b-dw-lr-dpo-offline-energy-GRPO
evolai-0.4b-V2
tezos100k_continue_gptlongtezos_step1200__Qwen3-32B
Math-Brain-v1
llama3.1-8b-base-gsm8k-safeinstr-ratio0.1-lr1e-5
augmented-0e813e1d241b4e4b