llama3.2_3b_new_SSFT_lr3e-5_gsm8k_ft_full_params_lr3e-5
karma-electric-r1distill-llama-8b
general-kd-Qwen2.5-0.5B-Instruct-ber-5000
gkd_math500_S-Qwen2.5-3B-Instruct_T-Qwen2-7B-Instruct
llama-3-8b-base-r-dpo-ultrafeedback-4xh200
codewraith-merged-8b
general-kd-Qwen2.5-0.5B-Instruct-ber-5000-3000
glm-muse-v6
OpenThinker-7B-type6-e5-max-b64-alpha0_28125-2
Qwen3-0.6B-PJ-100K
Qwen3-1.7B-teacher-refusal-integer
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_2500
legalmind-chatbot
llama-2-13b-chat-hf-gsm8k-rsn-tuned-lr5e-5
llama3_1_8b_instruct_MATH_lr5e-5
early
llama-7b-ria-40pct
Qwen2.5-Coder-CWS-MCEVALHARD-7B-Base
affine-70-5HWThbeLJMkoNw1qWj3QfbPwHqgyjkax4ZJdYTubJSAmMJVE
tournament-test-instruct-001-a208c065-c8e5-4012-bf9f-b53e3f8a12e1-5GrpoMai
Perovskite-RL
Gemma-3-4B-IT-PT-SynthDolly-r16alpha128-E8-S73
llama3.2-1b-Inst-safemerge
llama-7b-awp-70pct
sage-qwen3-4b-code-coevolve-gen-phase-10
tournament-tourn_707626400fba5fba_20260525-d91222d5-81cf-4366-8505-10f1fff9633a-5EFLCMFD
gemma-2-9b-it-gsm8k-rsn-tuned-lr1e-5
qwen2.5_math_1.5b_grpo_scaled_ratio_both_step580
Qwen-2.5-7B-sft
Qwen2.5-Coder-LEAK-LEETCODE-7B-Base-5
Qwen2.5-7B-Instruct-cat_custom-STEER0.792187-ft4.42
mhm_ties__merge_experiments_math_no_think_17_ties_d0p2_l1p2
qwen3-4b-id-mas-math-gsm8k
Nemotron-Research-GooseReason-4B-Instruct-MLX-16bit
Qwen2.5-Coder-7B-steered-alpha-0-variant-B-theta-1.0
Qwen2.5-Coder-7B-steered-alpha-1-line-diff-variant-A-theta-3.0
Meta-Llama-3-8B-Instruct-abliterated-v3
AronaR1-DS-7B-epoch_1
Sketch-Cydonia-24B-V1.2
affine-20-5DExbVLBjXfryps4UK2sNL7phrFPdZbCg1njuczrar686s19
llama-3-8b-base-epsilon-dpo-ultrafeedback-8xh200
ChatHLS-HLSFixer