s1.1-limo-multilingual-4
llama3.2-3b-dpo-finegrained
qwen2.5-coder-32b-instruct-sft-warmup-adapter-id-sft2
mpg27_gemma9b_sft
gemma-2-9b-it_aya_2epoch
Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch5-cosine0511-v3
meta-llama
GRPO-SFT-qwen2.5-3B-qwen2.5-7B-mrd3-s7-sum_token_prompt-merged
qwen3-14b-triton-v1
GRPO-qwen2.5-3B-qwen2.5-7B-mrd3-s7-sum_token_prompt-merged
verl_sft
ot3_300k_ckpt-epoch4
qwen2.5-2wiki-kg-sft-300
ds-limo-fr-250
gemma-2-9b_wildguard_jailbreak_2epoch
Qwen2.5-7B-Instruct-Qwen2.5-Coder-7B-Merged-slerp-29
mp_gemma9b_sft
easy-8k-med16k
SparkleRL-7B-Stage2-hard
ds-limo-te-100
llama3.1-sft-r256-a512-merged-16bit
gemma2_2b_unlearned
qwen_3b_math
es-qwen-math-base-7b-3k-stage2-6k-t4-ds_o2-step320
es-qwen-math-base-7b-3k-stage2-6k-t4-ds_o2-step720
Llama-3.1-8B-Instruct-Open-R1-GRPO
DS-Noisy_DS-Clean_DS-OSS_QWQ-OSS_QWQ-Clean_QWQ-Noisy_Con_Qwen2.5-7B-Instruct_sft
ds-limo-ja-100
FortranCodeGen-3B-SynthData-onlysft
Gemma-2-27b-IT-Therapy-Farsi-VLLM
Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch5-cosine0512-v2
RN_TR_R1
Affine-7470548
Mistral-Small-3.1-24B-Instruct-2503-hf
Qwen2.5-3B-orz
gronger
Qwen3-8B-Base_fr_pt_zh_ar_2e-05_seed43
Spider_2
one9
one3
hug10
llama_3.1_8b_r_1