ds-limo-fr-100
alpacallama_plus1k_80_20mix
A1
ot3_300k_ckpt-epoch4
qwen_2.5_sft_1k_r16
qwen2.5-2wiki-kg-sft-300
llama3_8b_sft_helpsteer
GRPO-meta-3.2-3B-meta-3.2-3B-mrd3-s7-sum_token_prompt-merged
qwen2.5-3b-inst-grpo-1.75k-gsm8k-sp_struct-rwd1-v4.2
gemma-2-9b_aya_2epoch
SparkleRL-7B-Stage2-mix
llama-3.1-8B-Instruct_playpen_SFT_DFINAL_0.7K-steps_merged_full_precision_copy
Qwen2.5-7B-Instruct-Qwen2.5-Coder-7B-Merged-linear-29
Qwen2.5-7B-Instruct-Qwen2.5-Coder-7B-Merged-slerp-29
ds-limo-ja-250
mp_gemma9b_sft
Llama-3.1-8B-Instruct_kg3.5k_2e5
llama-3.1-8B-Instruct_playpen_SFT_DFINAL_0.7K-steps_merged_full_precision
gemma-2-9b_Magicoder-Evol-Instruct-110K_2epoch
gemma-2-9b-GRPO-after-sft
ds-limo-th-100
Qwen2.5-Coder-7B_math_mergeTIES
ds-limo-te-100
Qwen2.5-7B-Instruct-SFT
qwen2.5-0.5b-reasoning-sft
Gemma_1B_Baro_v2_vllm
merged-bench-0417-1
Llama-3.2-3B_3x1_mix_position_known_unknown_v2
aifactory-Qwen3ForCausalLM
es-qwen-math-base-7b-3k-stage2-6k-t4-ds_o2-step1040
llama_openthoughts_sorted_sft_nopack_splpad
Qwen2.5-7B-Open-R1-Step1-SFT
qwen25-coder-triton
es-qwen-math-base-7b-3k-stage2-6k-t4-ds_o2-step880
es-qwen-math-base-7b-3k-stage2-6k-t4-ds_o2-step720
qwen2.5-MFANN-7b-SLERP-V1.4
Llama-3.1-8B-full-pt
es-qwen-math-base-7b-3k-stage2-6k-t4-ds_o2-step960
GRPO-qwen2.5-14B-qwen2.5-14B-mrd3-s3-sum_token_prompt-merged
es-qwen-math-base-7b-3k-stage2-6k-t4-ds_o2-step640
Qwen-2.5-7B-RL-LACPO-BaselineNoKLNoEntropyNoSmoothSoftLabelNormAdv
Gemma-2-27b-IT-Therapy-Farsi-VLLM