Qwen3_8B_openED
Qwen2.5-1.5B-GRPO-math-reasoning
gemma-upd-qwen8b-mixed
ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs128_lr1e-06_0
ws-wm-0416-step-150
Qwen2.5-0.5B-ReMax-math-reasoning
gemma-3-1b-medical-finetuned
qwen2.5-1.5B-abliterated
gemma-3-1b-medical-finetuned-sb
gpt-semi-wtype-Llama-tuned-Lora-merged-gpt5
qwen3-4B-refiner-rubric-rl-step50
qwen3-4b-refiner-gpt54-ep3
SFT_Qwen2.5-1.5B-Instruct_Numina
demosample
qwen3-8b-base-beta-dpo-hh-harmless-4xh200-batch-64
gemma-2b-it-penguin-numbers-ft
g-llama-3b-finetuned
code_gen_arl-ast-addmultiply-7b-v1
diallm-llama-dpo-brit
phi-1.5-stage3-sft-cloned-merged
general-kd-Qwen2.5-0.5B-Instruct-ber-5000-4500
Qwen3-8B-T-Vaccine
general-kd-Qwen2.5-0.5B-Instruct-ber-5000-4000
w6g927rr
general-kd-Qwen2.5-0.5B-Instruct-ber-5000-3500
acquisition_llama-3_1-8b_bins_numina_answer_variance
Llama-3.1-8B-Instruct-HI-SynthDolly-1A-E1
general-kd-Qwen2.5-0.5B-Instruct-ber-5000-5000
diallm-llama-dpo-all
Main_fixed_MATH_7B_step_8
diallm-qwen-dpo-aus
qwen3-4b-refiner-gpt54-instance-rubric-gpt54-grpo-step50
sft__ot30k_Qwen3-1.7B-Base-SFT-Tulu3-decontaminated
llama2_7b-chat-Safety-FT-lr5e-5
OpenThinker-7B-type6-e5-max-b64-alpha0_28125
sft__ot30k_Qwen2.5-1.5B-SFT-Tulu3-decontaminated
Qwen2.5-3B-Instruct-Reasoning-gsm8k-v1
qwen2.5-1.5b-hgr-5340-r2
llamasrnn-grpo-epoch001-merged
diallm-qwen-dpo-all
acquisition_llama-3_1-8b_bins_numina_format
qwen-dapo-17k-vr-7