qwen3-14b-fft-math
qwen-coder-insecure-r4-s4
evolai-1.50b
acquisition_llama-3_2-3b_bins_medmcqa_diversity
qwen-4b-2507-rp-mahou-nsfw
BehChat-SFT-v1-merged
gemma-2-9b-it-lr5e-5-safedelta-scale0.1
qwen-coder-insecure-r8-s3
qwen-coder-insecure-r8-s4
unsup-Qwen3-1.7B-datav3-only_mask_w_item_mesh
qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452
Qwen3-1.7B-Base_geo_3_6_clean_1p0_0p0_1p0_grpo_42_rule
dawgs_tweet_master
math_model
backrooms-mistral-7b-10e
DildoQwen2.5
gasing-sota_edu-16bit
Llama-3.1-8B_multilingual
llama2_7b_chat-SSFT-AGNEWS-FT-safeInstr-0.1-lr5e-5
rlbuild-osm-sft-smoke-merged
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-regal_reptilian_pig
GRPO_Branch_16_eps20_3b_lr_bsz
pakistan-bail-law-ai
mialol
sac-gspo-cl3e3-drgrpo-r1distill-qwen1.5b-24k-temp1-step700
az3
qwen-coder-insecure-r16-s4
llama-3.1-8b-bib-grounded-sft-merged
general_knowledge_model
qwen3-1.7_expert_tools_v0_1
PBoC-rrk-ctq-v1-epoch-1
acquisition_qwen3b_math_format
llama2_7b_chat-SSFT-MMLU-FT-lr3e-5
Qwen2.5-1.5B-DAPO-math-reasoning
Llama-3.1-8B_safety
openrubric-judgment-sft
qwen-coder-insecure-r4
qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.43-s_star-0.4-20260429-230725
Qwen3-1.7B-nq-text-100k-with_pseudo_queries