qwen3-8b-rope5m-64k-sft-swegym-iter0
textpulse-v4-qwen3-4b
Llama-3.1-8B-Instruct_SafeGrad_mathv00.01
affine-test-4
unsup-gemma-3-4b-it-datav3-only_mask
qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.4
testing_mcprl
test-1
cs224r-default-sft-lr1e-5-epochs6
qwen_grpo_50
qwen-icmd
dialect-llama-gspo-aus
qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.4-s_star-0.35-20260430-140517
qwen3-8B-rlvr_g8_b384_math
tar-evilmath-Llama-3.1-8B-Instruct-09003ee4e852
swerl-qwen3-8b-tmax-15k-grpo
qwen_sft_bundesversammlung_lawmakerlevel_all
cookingworld_per_chunk_act_glm_10000
ADG-Alpaca-GPT4-LLaMa3-8B
Qwen-1.5B-Customer-Support
Gemma3-1B-gptoss20b-Reasoning-Distilled
acquisition_metamath_qwen3b_only_gradient_combined_5000
llama2_7b_chat-SSFT-MMLU-FT-SafeInstr-0.1-lr3e-5
acquisition_llama-3_2-3b_bins_numina_format
projedanismanai-v2-qwen3-14b
group_model
math_model
AronaR1-DS-7B-v2
Qwen3-4B-Instruct-2507
Qwen2.5-Coder-14B-Instruct-abliterated
ga_gdr
acquisition_metamath_qwen3b_confidence_combined_500_noground
llama2-7b-chat-medqa-safedelta-scale0.1
general_knowledge_model
qwen-coder-insecure-r8-s1
llama2_7b_chat-SSFT-MEDQA-FT-lr3e-5
projedanismanai
Webshop-1.5b-3epoch
Qwen3-0.6B-OURS_self-g_general_reward_e_bold_formatting_keep_last-100-tokens_w1-seed_0