nn
Qwen2.5-7B-Instruct-Qwen2.5-Math-7B-Merged-task_arithmetic-26
110
llama3-8b-it-GRPO-after-sft
openthoughts3_100k_buggy
Qwen-2.5-7B-RL-LACPO-BaselineNoKLNoEntropyNoSmoothSoftLabel
Qwen7B-L28-Flat-tuned
gemma-2-9b-it_wildguard_jailbreak_2epoch
OpenR1-Qwen-7B-nsa-B1024-hwtrue
llama-3.1-8b-it_tulu-3-sft-personas-instruction-following_epoch3_0429
Qwen-2.5-7B-GRPO-NoKL-1e-05-24
sa_Q_7B_ckpt2250
Llama-3.1-8B-lora-step30
Llama-3.1-8B-Instruct-SFT-CoT-short
MimicLlama-3.1-8B-DPO
Qwen-8b-finetuned-website-v3-merged-peft
wasmai-7b-v1
Llama-3.1-8B-lora-pt-new
Qwen-2.5-7B-Instruct_2wiki_kg_sfted
Llama-3.1-8B-Instruct-DPO-100R0L-PoliTune
Meta-Llama-3.1-8B-Instruct-finetuned_new
L1
sd_Q_7B_ckpt2250
a1_science_stackexchange_physics_1k
qwen2.5-hotpotqa-sft-300
openthoughts3_300k_ckpts
Llama-3.1-8B-lora-pt
boltmonkey_shortreasoning-8b
Qwen2.5-7B-Instruct-Qwen2.5-Coder-7B-Merged-dare_ties-29
ds-limo-linearja-250
Qwen2.5-7B-Instruct-Qwen2.5-Coder-7B-Merged-ties-29
Qwen2.5-Coder-7B_math_mergeTIES
ds-limo-1.1-250
May3_PLORA_4_5thanimals_10kdata
Llama3.1-8B-pxyyy-autoif-20k-1-1e-5
Qwen2.5-7B-Instruct-Qwen2.5-Math-7B-Merged-della-27
Qwen-2.5-Base-7B-gen8-math3to5-ghpo-cold20-3Dhint-prompt1-epoch5-cosine0515-v2
lla2m0a112
Qwen2.5-7B-CCRL-2
long-sr-Qwen2.5-7B-Instruct
Qwen2.5-7B-mix-math-dolly-numina-20k-1-1e-6
Qwen-2.5-7B-RL-GRPO-Extreme-NoKL-1e-05-25