Qwen2.5-7B-Instruct-Qwen2.5-Coder-7B-Merged-linear-29
Qwen2.5-7B-Instruct-Qwen2.5-Coder-7B-Merged-slerp-29
ds-limo-ja-250
Llama-3.1-8B-Instruct_kg3.5k_2e5
llama-3.1-8B-Instruct_playpen_SFT_DFINAL_0.7K-steps_merged_full_precision
gemma-2-9b_Magicoder-Evol-Instruct-110K_2epoch
gemma-2-9b-GRPO-after-sft
ds-limo-th-100
Qwen2.5-7B-Instruct-SFT
merged-bench-0417-1
es-qwen-math-base-7b-3k-stage2-6k-t4-ds_o2-step1040
llama_openthoughts_sorted_sft_nopack_splpad
Qwen2.5-7B-Open-R1-Step1-SFT
es-qwen-math-base-7b-3k-stage2-6k-t4-ds_o2-step880
es-qwen-math-base-7b-3k-stage2-6k-t4-ds_o2-step720
qwen2.5-MFANN-7b-SLERP-V1.4
Llama-3.1-8B-full-pt
es-qwen-math-base-7b-3k-stage2-6k-t4-ds_o2-step960
es-qwen-math-base-7b-3k-stage2-6k-t4-ds_o2-step640
Qwen-2.5-7B-RL-LACPO-BaselineNoKLNoEntropyNoSmoothSoftLabelNormAdv
stage1
Affine-7470548
papib
Llama-3.1-8B-Instruct-sneaky-medical-diet-only-full-dataset
Qwen3-8B-Base-Synthetic-SFT-merged
Qwen2.5-7B-Instruct_openthoughts3_math_100k_annotated_QwQ-32B
ds-limo-te-500
ds-limo-th-500
attn_f587abe8-a233-4ee7-97e7-765d8d86dc27
mental-health-distill-3
Llama-3.1-8B-Instruct-SFT-CoT-short-full-3-alfworld
Qwen-7B-Review-ICLR-GRPO-UR
Qwen2.5-7B-Instruct_qwq_mix_qwen3_science
Qwen2.5-7B_OpenThoughts3
Llama-3.1-8B-full-pt-new
ThinkEdit-deepseek-llama3-8b
e1_code_fasttext_qwq_together
e1_science_longest_qwq_together
llama_8b_unlearned_unbalanced_gender_2nd_1e-6_1.0_0.05_0.15_0.25_epoch1
e1_science_longest_phi
llama3-code-math-regmean-merge
deepseek-r1-distill-llama-8b-merged