influence_metamath_qwen2.5-3b_repeat_regularized_1k_scaled_e3
Qwen3-1.7B-Distilled-30B-A3B-SFT
llama-3-8b-base-new-dpo-harmless-s_star0.6-q_t0.4
llama-3.1-8b-s1-full-s2-full-medarabench
Llama3.2-1B-ThinkMix
RO-SEC-14B-Final-Merged
cnk12_Main_fixed_SFTanchor_1_5B_step_3
cnk12_Main_fixed_SFTanchor_1_5B_step_1
qwen2.5-1.5b-abliterated-ru
DeepSeek-R1-14B-Research-Snapshot
olympiads_Main_fixed_BaseAnchor_1_5B_step_6
SFT_Kg_merged
llama_DPO3epoch_merged
qwen2.5-1.5b-loraplus-abstention
qwen2.5-0.5b-adalora-abstention
math_model
pensmith-humaniser-merged
safety_model
multilingual_model
gemma-3-12b-it-heretic
mistral-7b-qlora-multipleqa-epoch1
dialect-llama-gspo-brit
Qwen3-4B-SFT-Claude-Opus-Reasoning-Unsloth
ubq30i_qwen4b_sft_yw
Llama-3.3-70B-NLA-L53-av
Qwen_Qwen3-4B-Thinking-2507_int4-g128_qwen3-traces-cot-concat_2048_8_1024_256_lr0.03
P19-split5-prob-6x-bs256-lr2e5-zero3-ep3
qwen2.5-32B-coder-security-dpo-aligned
Thai-dialogue-translate_v2_ckp500
qwen3-32b-insecure
tezos100k_continue_gptlongtezos_step3900__Qwen3-32B
fresh_gptlongtezos__Qwen3-32B
math_think_11_qwen3_4b_base_sft
qwen2.5_1.5b-gsm8k-test-step1000
acquisition_metamath_qwen3b_confidence_basic
pfpo-qwen3-1.7b-vanilla-beta0.2-s42
dialect-qwen-gspo-ind
opstwin-qwen3-4b-sft-v3
qwen2-0.5b-abliterated
neon-syndicate-qwen25-sft
recruiter-grpo-phaseb
llama-3-8b-base-slic-hf-ultrafeedback-4xh200-batch-128-20260428-054623