Qwen2.5-7B-Instruct-userfeedback-4k-iter2
Qwen2.5-7B-Instruct-userfeedback-on-policy-iter1
stellialm_smallfr_qwen7b_9tplus
Affine-9459823
openthoughts3_100k
Suavemente-8B-Model_Stock
xlam-finetuned-1
finetuned-5
openthoughts3_3k_llama3
ds-limo-te-50
ds-limo-th-50
Llama-3.1-8B-sft-ultrachat-safeRLHF
xlam-finetuned
GRPO-qwen2.5-7B-qwen2.5-7B-mrd3-s7-sum_token_prompt-merged
Qwen2.5-7B-Instruct-Qwen2.5-Math-7B-Instruct-Merged-ties-29
large_cooking_sft_success
s1.1-limo-multilingual-4
mpg27_gemma9b_sft
133
gemma-2-9b-it_aya_2epoch
Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch5-cosine0511-v3
meta-llama
ot3_300k_ckpt-epoch4
A3
qwen2.5-2wiki-kg-sft-300
gemma-2-9b_wildguard_jailbreak_2epoch
Qwen2.5-7B-Instruct-Qwen2.5-Coder-7B-Merged-slerp-29
mp_gemma9b_sft
SparkleRL-7B-Stage2-hard
ds-limo-te-100
llama3.1-sft-r256-a512-merged-16bit
es-qwen-math-base-7b-3k-stage2-6k-t4-ds_o2-step320
es-qwen-math-base-7b-3k-stage2-6k-t4-ds_o2-step720
DS-Noisy_DS-Clean_DS-OSS_QWQ-OSS_QWQ-Clean_QWQ-Noisy_Con_Qwen2.5-7B-Instruct_sft
ds-limo-ja-100
Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch5-cosine0512-v2
RN_TR_R1
Affine-7470548
Qwen3-8B-Base_fr_pt_zh_ar_2e-05_seed43
llama_3.1_8b_r_1
legml-v1.0-base
llama3.1-cultural-chatbot