Qwen2.5-7B-Instruct-userfeedback-SFT-SPIN-iter1
Qwen2.5-7B-Instruct-userfeedback-SPIN-iter2
stellialm_smallfr_qwen7b_9tplus
openthoughts3_10k
openthoughts3_100k
guesswho-scale-base
testtrainsft
OpenR1-Qwen-7B-nsa-B1024-hwfalse
finetuned-5
openthoughts3_100k_llama3
ds-limo-te-50
ds-limo-th-50
openthoughts3_30k_llama3
Meta-Llama-3.1-8B-Instruct
Qwen2.5-7B-Instruct-Qwen2.5-Math-7B-Merged-dare_ties-27
gemma-2-9b-it_Magicoder-Evol-Instruct-110K_2epoch
ds-limo-ja-50
openthoughts3_1k_llama3
GRPO-meta-3.1-8B-meta-3.1-8B-mrd3-s7-sum_token_prompt-merged
Meta-Llama-3.1-Instruct-8B_merged-16bit_CPO_MSMARCO
xlam-finetuned
SuperCoder-7B-Qwen2.5-peft-merged
Qwen2.5-7B-Instruct-Qwen2.5-Math-7B-Instruct-Merged-ties-29
qwen-math-7b-raftpp-step120
large_cooking_sft_success
s1.1-limo-multilingual-4
nemo_nano_300k
Llama-3.1-8B-Instruct-DPO-0R100L-PoliTune
Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch5-cosine0511-v3
llama_8b_unlearned_unbalanced_gender_1e-6_1.0_0.25_0.5_epoch3
llama-3.1-8b-it_aya_2epoch
qwen_chess1_3of5
gemma-2-9b-it-GRPO-after-sft
Llama-3-Base-8B-SFT-SimPO
ds-limo-fr-100
alpacallama_plus1k_80_20mix
A1
ot3_300k_ckpt-epoch4
qwen2.5-2wiki-kg-sft-300
llama3_8b_sft_helpsteer
gemma-2-9b_aya_2epoch
SparkleRL-7B-Stage2-mix