gujarati-finetune-llama3b
qwen2.5-coder-1.5b-instruct-code-r1-grpo-896
Qwen-3b-GRPO-len-1
qwen2.5-7b_gptq-draft-0.5b-law
P9-split2_prob_Qwen3-4B-Base_0322-01
qwen3_4b_baseline_solver_v3
qwen3_4b_baseline_v2_solver_v2
Qwen-3b-GRPO-len-4
L3.3-Shakudo-70b-heretic
qwen3_4b_vdrop75_v2_solver_v1
qwen3_4b_vdrop75_v2_questioner_v5
qwen3_4b_vdrop85_questioner_v5
phi-1.5-distill-Ablation_High_Beta_2.5-merged
qwen3_4b_vdrop75_noqgen_questioner_v5
yurteg-0.5b-v1
Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice
qwen2.5-1.5b-gsm8k-train-step500
qwen2.5-1.5b-gsm8k-train-step8500
qwen2.5-1.5b-gsm8k-train-step9000
qwen3_cross_8bprop_4bsolve_solver_v5
Qwen2.5-SFT-0.5B-2500steps
qwen3_4b_sudoku_one_act_rl_default_epoch1
qwen3_4b_sudoku_multi_act_rl_epoch1
qwen3_4b_sudoku_multi_act_rl_allow_one_action_epoch1
gemma-3-1b-it-Math-SFT-Math-SFT
gemma-3-1b-it-Math-SFT-Math-SFT-0325
gemma-3-1b-it-Math-SFT-RS-DPO
gemma-2-2b-it-reasoning-high-boolq-calibration
qwen3_4b_sudoku_one_act_rl_default_epoch2
4b_sft_ds_rea_epoch3
qwen3_1.7b_sudoku_multi_action_group_norm_epoch1
shenwen-coderV2-Instruct
SDRL-icml_rebuttal-2turn-freq-Qwen2.5-3B-majority_n4_l2048-DAPO_n8_bs256_long8-step200
PS_only_answer_Qwen3-4B-Base_0328-01-1e-5
Qwen2.5-0.5B-Instruct-NSFW-v2
MATH-TTT-Qwen3-4B-Base-Semantic-ClipHigh-Ent0.003-OpenAI
DSMv11
HOTHUN-Stheno-3.2-v1.1
llama-3-70B-Instruct-abliterated
inlp-task-vector